Chapter 6: Subqueries and Derived Tables
Complex business questions often require breaking problems into smaller, composable parts. SQL subqueries and derived tables—sometimes called inline views—offer a clean way to nest queries, test membership, and encapsulate logic. In this chapter, you’ll learn how to:
Write scalar and correlated subqueries
Use
EXISTSandNOT EXISTSfor efficient membership testsCreate derived tables for readability and modularity
Nest queries to filter data based on other query results
By the end, you’ll tackle multi-step analyses with maintainable, self-documenting SQL.
1. Understanding Subqueries
A subquery is a query enclosed in parentheses that returns data to the outer (parent) query. There are two main flavors:
Scalar subqueries return a single value.
Correlated subqueries reference columns from the outer query and re-execute per row.
1.1 Scalar Subqueries
Use scalar subqueries when you need a single aggregated or computed value:
SELECT
o.order_id,
o.total_amount,
(
SELECT AVG(total_amount)
FROM orders
) AS avg_order
FROM orders o
WHERE o.order_date = CURRENT_DATE;
Here, the subquery computes the overall average just once and attaches it to every row. Scalar subqueries must return exactly one column and at most one row; otherwise you’ll get errors.
1.2 Correlated Subqueries
A correlated subquery runs once for each row of the outer query, referencing one or more outer columns:
SELECT
e.employee_id,
e.name,
(
SELECT COUNT(*)
FROM orders o
WHERE o.sales_rep_id = e.employee_id
) AS orders_count
FROM employees e;
For each employee, the subquery counts how many orders they handled. While powerful, correlated subqueries can be slower on large tables because they execute repeatedly.
2. EXISTS and NOT EXISTS for Membership Tests
Rather than counting matches, EXISTS and NOT EXISTS test for the presence or absence of related rows. These constructs often outperform IN or correlated aggregations.
2.1 EXISTS
Return rows in the outer query only if the subquery finds at least one match:
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
This lists only customers who have placed orders.
2.2 NOT EXISTS
Exclude outer rows when the subquery returns any results:
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Here, you get customers with zero orders—ideal for targeting dormant accounts.
3. Derived Tables (Inline Views)
A derived table is a subquery used in the FROM clause, given an alias, and treated like a real table. Derived tables improve readability by encapsulating complex joins or aggregations.
3.1 Basic Syntax
SELECT dt.department, dt.avg_salary
FROM (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dt
JOIN departments d
ON dt.department_id = d.department_id
ORDER BY dt.avg_salary DESC;
The inner query (dt) computes average salaries per department. The outer query enriches this with department names.
3.2 Advantages of Derived Tables
Isolation of logic: Aggregations or filters live in one place.
Reusability in the same query: You can reference
dtmultiple times.Clarity: Outer SELECT focuses on business output, not intermediate steps.
4. Nesting Queries for Advanced Filtering
Subqueries and derived tables can be nested to handle multi-step analyses—filter by aggregates, then join or filter again.
4.1 Filtering on Aggregates
You cannot use aggregates directly in WHERE; you use a derived table or CTE:
SELECT o.customer_id, o.order_id, o.total_amount
FROM orders o
JOIN (
SELECT customer_id, MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id
) AS latest
ON o.customer_id = latest.customer_id
AND o.order_date = latest.last_order;
This returns each customer’s most recent order.
4.2 Multi-Level Nesting
For highly complex logic, you can nest multiple layers:
SELECT sq.product_id, sq.revenue_rank
FROM (
SELECT
product_id,
RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rank
FROM (
SELECT
oi.product_id,
SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM order_items oi
GROUP BY oi.product_id
) AS aggregated
) AS sq
WHERE sq.revenue_rank <= 10;
Inner derived table (
aggregated) computes total revenue per product.Middle derived table ranks products by revenue.
Outer query filters for the top 10 products.
While Common Table Expressions (CTEs) often improve readability, inline derived tables accomplish the same nesting without requiring a separate WITH clause.
5. Performance and Best Practices
Prefer EXISTS for Membership:
EXISTSstops at first match;INmay scan all matches.Avoid Unnecessary Correlated Subqueries: When possible, replace with derived tables or joins.
Limit Derived Table Size: Push filters into the inner query to minimize rows.
Index Join and Filter Columns: Ensure columns used in subquery predicates are indexed.
Test Execution Plans: Use
EXPLAINorEXPLAIN ANALYZEto compare subquery vs. join performance.Consider CTEs for Complex Nesting: Readability vs. inline views—choose based on team conventions and database support.
6. Real-World Examples
6.1 Identifying Repeat Customers
SELECT customer_id, name
FROM customers c
WHERE (
SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id
) > 5;
6.2 Listing Products Never Ordered
SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
6.3 Calculating Year-Over-Year Growth
SELECT
e.product_id,
((current_year.revenue - prev_year.revenue) / prev_year.revenue) * 100 AS yoy_growth
FROM (
SELECT product_id, SUM(amount) AS revenue
FROM sales
WHERE sale_date BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY product_id
) AS current_year
JOIN (
SELECT product_id, SUM(amount) AS revenue
FROM sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY product_id
) AS prev_year
ON current_year.product_id = prev_year.product_id;
7. Conclusion
Subqueries and derived tables let you decompose complex problems into manageable, reusable components. By mastering scalar and correlated subqueries, leveraging EXISTS/NOT EXISTS for membership checks, and harnessing inline views for clarity, you write cleaner, more maintainable SQL. As you encounter ever more intricate reporting and analysis requirements, these techniques will keep your queries both powerful and readable.

Comments
Post a Comment