Part II: Retrieving Data
Chapter 5: Aggregation and Grouping
Summarizing data is essential when you need high-level insights from large tables. Aggregate functions let you collapse detailed rows into single metrics—like totals, averages, or counts. Grouping then partitions those rows into buckets for segmented analysis. In this chapter, we’ll explore:
Core aggregate functions:
COUNT,SUM,AVG,MIN,MAXUsing
GROUP BYto create logical bucketsFiltering groups with
HAVINGHandling
NULLvalues within aggregationsPractical examples for generating charts and reports
1. Aggregation Functions Overview
Aggregate functions process multiple rows to produce a single summary value. They ignore row-level granularity and calculate metrics across a set:
COUNT(expr)returns the number of non-NULLvalues or*for all rowsSUM(expr)adds numeric values across rowsAVG(expr)computes the average of numeric valuesMIN(expr)finds the smallest valueMAX(expr)finds the largest value
Each function can be applied to a column, an expression, or * in the case of COUNT. You cannot mix aggregates and non-aggregates without grouping.
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS average_order,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM orders;
2. Counting Rows and Calculating Sums
2.1 COUNT
COUNT(*)counts all rows, including those withNULLvalues in other columnsCOUNT(column)counts only non-NULLvalues in that column
-- Total number of orders
SELECT COUNT(*) AS order_count
FROM orders;
2.2 SUM
SUM(column)returns the total of numeric values, ignoringNULLCan be wrapped in
COALESCEto treatNULLas zero
-- Total revenue from completed orders
SELECT SUM(amount) AS revenue
FROM orders
WHERE status = 'completed';
3. Averages, Minimums, and Maximums
3.1 AVG
Calculates the mean of non-
NULLnumeric valuesSubject to rounding—use
ROUND()for formatting
SELECT AVG(amount) AS avg_order_value
FROM orders
WHERE order_date >= '2025-01-01';
3.2 MIN and MAX
MIN(column)finds the smallest valueMAX(column)finds the largest value
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
WHERE customer_id = 123;
4. Grouping Data into Buckets with GROUP BY
The GROUP BY clause partitions rows by one or more columns. Each unique combination becomes a bucket for aggregation.
SELECT
customer_id,
COUNT(*) AS orders_placed,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id;
– You can group by multiple columns:
GROUP BY year, month, product_category
– All non-aggregate columns in SELECT must appear in GROUP BY.
5. Filtering Aggregated Groups with HAVING
WHERE filters individual rows before aggregation; HAVING filters groups after aggregation.
-- Top customers who spent over $1,000
SELECT
customer_id,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;
Use HAVING sparingly—pushing filters into WHERE (when possible) usually performs better.
6. Dealing with NULLs in Grouped Data
NULL values can skew counts and averages:
COUNT(column)ignoresNULL;COUNT(*)includes themSUM(NULL)andAVG(NULL)skipNULLrowsUse
COALESCE(column, default)to replaceNULLbefore aggregation
SELECT
region,
COUNT(*) AS total_customers,
COUNT(email) AS customers_with_email,
SUM(COALESCE(sales, 0)) AS total_sales
FROM customers
GROUP BY region;
This ensures that missing sales values are treated as zero rather than excluded.
7. Practical Examples
7.1 Monthly Sales Summary
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue,
COUNT(*) AS orders_count,
AVG(amount) AS avg_order
FROM orders
GROUP BY month
ORDER BY month;
7.2 Product Category Performance
SELECT
p.category,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.category
HAVING SUM(oi.quantity) > 100;
8. Generating Charts and Reports
Aggregated query results are the basis for dashboards and visualizations:
Export results to BI tools (Tableau, Power BI) via CSV or direct connections
Build time-series charts from monthly or quarterly buckets
Compare categories side by side using bar or pie charts
Highlight outliers with conditional formatting
Example CSV export:
COPY (
SELECT month, revenue
FROM (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY month
) sub
ORDER BY month
) TO '/tmp/monthly_revenue.csv' CSV HEADER;
9. Performance Considerations and Best Practices
Index grouping columns: Speeds up grouping on large datasets
Pre-aggregate: Use materialized views for expensive, frequently queried summaries
Avoid unnecessary columns: Selecting only aggregates and group keys reduces I/O
Use sampling: For interactive analysis on massive tables, sample a subset
Conclusion
Aggregation and grouping transform row-level details into actionable insights at a glance. With COUNT, SUM, AVG, MIN, MAX, plus GROUP BY and HAVING, you can summarize millions of transactions in seconds. Handling NULLs and applying best practices ensures accuracy and performance. In the next chapter, we’ll explore subqueries and derived tables to layer complex analyses seamlessly.

Comments
Post a Comment