Skip to main content

Part II: Retrieving Data Chapter 5: Aggregation and Grouping

 

Part II: Retrieving Data

Chapter 5: Aggregation and Grouping


Summarizing data is essential when you need high-level insights from large tables. Aggregate functions let you collapse detailed rows into single metrics—like totals, averages, or counts. Grouping then partitions those rows into buckets for segmented analysis. In this chapter, we’ll explore:

  • Core aggregate functions: COUNT, SUM, AVG, MIN, MAX

  • Using GROUP BY to create logical buckets

  • Filtering groups with HAVING

  • Handling NULL values within aggregations

  • Practical examples for generating charts and reports

1. Aggregation Functions Overview

Aggregate functions process multiple rows to produce a single summary value. They ignore row-level granularity and calculate metrics across a set:

  • COUNT(expr) returns the number of non-NULL values or * for all rows

  • SUM(expr) adds numeric values across rows

  • AVG(expr) computes the average of numeric values

  • MIN(expr) finds the smallest value

  • MAX(expr) finds the largest value

Each function can be applied to a column, an expression, or * in the case of COUNT. You cannot mix aggregates and non-aggregates without grouping.

sql
SELECT
  COUNT(*)     AS total_orders,
  SUM(amount)  AS total_revenue,
  AVG(amount)  AS average_order,
  MIN(amount)  AS smallest_order,
  MAX(amount)  AS largest_order
FROM orders;

2. Counting Rows and Calculating Sums

2.1 COUNT

  • COUNT(*) counts all rows, including those with NULL values in other columns

  • COUNT(column) counts only non-NULL values in that column

sql
-- Total number of orders
SELECT COUNT(*) AS order_count
FROM orders;

2.2 SUM

  • SUM(column) returns the total of numeric values, ignoring NULL

  • Can be wrapped in COALESCE to treat NULL as zero

sql
-- Total revenue from completed orders
SELECT SUM(amount) AS revenue
FROM orders
WHERE status = 'completed';

3. Averages, Minimums, and Maximums

3.1 AVG

  • Calculates the mean of non-NULL numeric values

  • Subject to rounding—use ROUND() for formatting

sql
SELECT AVG(amount) AS avg_order_value
FROM orders
WHERE order_date >= '2025-01-01';

3.2 MIN and MAX

  • MIN(column) finds the smallest value

  • MAX(column) finds the largest value

sql
SELECT
  MIN(order_date) AS first_order,
  MAX(order_date) AS last_order
FROM orders
WHERE customer_id = 123;

4. Grouping Data into Buckets with GROUP BY

The GROUP BY clause partitions rows by one or more columns. Each unique combination becomes a bucket for aggregation.

sql
SELECT
  customer_id,
  COUNT(*)      AS orders_placed,
  SUM(amount)   AS total_spent
FROM orders
GROUP BY customer_id;

– You can group by multiple columns:

sql
GROUP BY year, month, product_category

– All non-aggregate columns in SELECT must appear in GROUP BY.

5. Filtering Aggregated Groups with HAVING

WHERE filters individual rows before aggregation; HAVING filters groups after aggregation.

sql
-- Top customers who spent over $1,000
SELECT
  customer_id,
  SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;

Use HAVING sparingly—pushing filters into WHERE (when possible) usually performs better.

6. Dealing with NULLs in Grouped Data

NULL values can skew counts and averages:

  • COUNT(column) ignores NULL; COUNT(*) includes them

  • SUM(NULL) and AVG(NULL) skip NULL rows

  • Use COALESCE(column, default) to replace NULL before aggregation

sql
SELECT
  region,
  COUNT(*)                     AS total_customers,
  COUNT(email)                 AS customers_with_email,
  SUM(COALESCE(sales, 0))      AS total_sales
FROM customers
GROUP BY region;

This ensures that missing sales values are treated as zero rather than excluded.

7. Practical Examples

7.1 Monthly Sales Summary

sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  SUM(amount)                      AS revenue,
  COUNT(*)                         AS orders_count,
  AVG(amount)                      AS avg_order
FROM orders
GROUP BY month
ORDER BY month;

7.2 Product Category Performance

sql
SELECT
  p.category,
  SUM(oi.quantity)    AS units_sold,
  SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.category
HAVING SUM(oi.quantity) > 100;

8. Generating Charts and Reports

Aggregated query results are the basis for dashboards and visualizations:

  • Export results to BI tools (Tableau, Power BI) via CSV or direct connections

  • Build time-series charts from monthly or quarterly buckets

  • Compare categories side by side using bar or pie charts

  • Highlight outliers with conditional formatting

Example CSV export:

sql
COPY (
  SELECT month, revenue
  FROM (
    SELECT
      DATE_TRUNC('month', order_date) AS month,
      SUM(amount)                    AS revenue
    FROM orders
    GROUP BY month
  ) sub
  ORDER BY month
) TO '/tmp/monthly_revenue.csv' CSV HEADER;

9. Performance Considerations and Best Practices

  • Index grouping columns: Speeds up grouping on large datasets

  • Pre-aggregate: Use materialized views for expensive, frequently queried summaries

  • Avoid unnecessary columns: Selecting only aggregates and group keys reduces I/O

  • Use sampling: For interactive analysis on massive tables, sample a subset

Conclusion

Aggregation and grouping transform row-level details into actionable insights at a glance. With COUNT, SUM, AVG, MIN, MAX, plus GROUP BY and HAVING, you can summarize millions of transactions in seconds. Handling NULLs and applying best practices ensures accuracy and performance. In the next chapter, we’ll explore subqueries and derived tables to layer complex analyses seamlessly.

Comments

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

  Welcome back to the blog! Today we explore the life and legacy of Alfred Marshall (1842–1924) , the British economist who laid the foundations of modern microeconomics . His landmark book, Principles of Economics (1890), introduced core concepts like supply and demand , elasticity , and market equilibrium — ideas that continue to shape how we understand economics today. Who Was Alfred Marshall? Alfred Marshall was a professor at the University of Cambridge and a key figure in the development of neoclassical economics . He believed economics should be rigorous, mathematical, and practical , focusing on real-world issues like prices, wages, and consumer behavior. Marshall also emphasized that economics is ultimately about improving human well-being. Key Contributions 1. Supply and Demand Analysis Marshall was the first to clearly present supply and demand as intersecting curves on a graph. He showed how prices are determined by both what consumers are willing to pay (dem...

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

  Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...

Kickstart Your SQL Journey with Our Step-by-Step Tutorial Series

  Welcome to Data Analyst BI! If you’ve ever felt overwhelmed by rows, columns, and cryptic error messages when trying to write your first SQL query, you’re in the right place. Today we’re launching a comprehensive SQL tutorial series crafted specifically for beginners. Whether you’re just starting your data career, pivoting from another field, or simply curious about how analysts slice and dice data, these lessons will guide you from day zero to confident query builder. In each installment, you’ll find clear explanations, annotated examples, and hands-on exercises. By the end of this series, you’ll be able to: Write efficient SQL queries to retrieve and transform data Combine multiple tables to uncover relationships Insert, update, and delete records safely Design robust database schemas with keys and indexes Optimize performance for large datasets Ready to master SQL in a structured, step-by-step way? Let’s explore the full roadmap ahead. Wh...