Part II: Retrieving Data Chapter 6: Subqueries and Derived Tables

Chapter 6: Subqueries and Derived Tables

Complex business questions often require breaking problems into smaller, composable parts. SQL subqueries and derived tables—sometimes called inline views—offer a clean way to nest queries, test membership, and encapsulate logic. In this chapter, you’ll learn how to:

Write scalar and correlated subqueries
Use EXISTS and NOT EXISTS for efficient membership tests
Create derived tables for readability and modularity
Nest queries to filter data based on other query results

By the end, you’ll tackle multi-step analyses with maintainable, self-documenting SQL.

1. Understanding Subqueries

A subquery is a query enclosed in parentheses that returns data to the outer (parent) query. There are two main flavors:

Scalar subqueries return a single value.
Correlated subqueries reference columns from the outer query and re-execute per row.

1.1 Scalar Subqueries

Use scalar subqueries when you need a single aggregated or computed value:

sql

SELECT
  o.order_id,
  o.total_amount,
  (
    SELECT AVG(total_amount)
    FROM orders
  ) AS avg_order
FROM orders o
WHERE o.order_date = CURRENT_DATE;

Here, the subquery computes the overall average just once and attaches it to every row. Scalar subqueries must return exactly one column and at most one row; otherwise you’ll get errors.

1.2 Correlated Subqueries

A correlated subquery runs once for each row of the outer query, referencing one or more outer columns:

sql

SELECT
  e.employee_id,
  e.name,
  (
    SELECT COUNT(*)
    FROM orders o
    WHERE o.sales_rep_id = e.employee_id
  ) AS orders_count
FROM employees e;

For each employee, the subquery counts how many orders they handled. While powerful, correlated subqueries can be slower on large tables because they execute repeatedly.

2. EXISTS and NOT EXISTS for Membership Tests

Rather than counting matches, EXISTS and NOT EXISTS test for the presence or absence of related rows. These constructs often outperform IN or correlated aggregations.

2.1 EXISTS

Return rows in the outer query only if the subquery finds at least one match:

sql

SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

This lists only customers who have placed orders.

2.2 NOT EXISTS

Exclude outer rows when the subquery returns any results:

sql

SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

Here, you get customers with zero orders—ideal for targeting dormant accounts.

3. Derived Tables (Inline Views)

A derived table is a subquery used in the FROM clause, given an alias, and treated like a real table. Derived tables improve readability by encapsulating complex joins or aggregations.

3.1 Basic Syntax

sql

SELECT dt.department, dt.avg_salary
FROM (
  SELECT department_id, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department_id
) AS dt
JOIN departments d
  ON dt.department_id = d.department_id
ORDER BY dt.avg_salary DESC;

The inner query (dt) computes average salaries per department. The outer query enriches this with department names.

3.2 Advantages of Derived Tables

Isolation of logic: Aggregations or filters live in one place.
Reusability in the same query: You can reference dt multiple times.
Clarity: Outer SELECT focuses on business output, not intermediate steps.

4. Nesting Queries for Advanced Filtering

Subqueries and derived tables can be nested to handle multi-step analyses—filter by aggregates, then join or filter again.

4.1 Filtering on Aggregates

You cannot use aggregates directly in WHERE; you use a derived table or CTE:

sql

SELECT o.customer_id, o.order_id, o.total_amount
FROM orders o
JOIN (
  SELECT customer_id, MAX(order_date) AS last_order
  FROM orders
  GROUP BY customer_id
) AS latest
  ON o.customer_id = latest.customer_id
 AND o.order_date = latest.last_order;

This returns each customer’s most recent order.

4.2 Multi-Level Nesting

For highly complex logic, you can nest multiple layers:

sql

SELECT sq.product_id, sq.revenue_rank
FROM (
  SELECT
    product_id,
    RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rank
  FROM (
    SELECT
      oi.product_id,
      SUM(oi.quantity * oi.unit_price) AS total_revenue
    FROM order_items oi
    GROUP BY oi.product_id
  ) AS aggregated
) AS sq
WHERE sq.revenue_rank <= 10;

Inner derived table (aggregated) computes total revenue per product.
Middle derived table ranks products by revenue.
Outer query filters for the top 10 products.

While Common Table Expressions (CTEs) often improve readability, inline derived tables accomplish the same nesting without requiring a separate WITH clause.

5. Performance and Best Practices

Prefer EXISTS for Membership: EXISTS stops at first match; IN may scan all matches.
Avoid Unnecessary Correlated Subqueries: When possible, replace with derived tables or joins.
Limit Derived Table Size: Push filters into the inner query to minimize rows.
Index Join and Filter Columns: Ensure columns used in subquery predicates are indexed.
Test Execution Plans: Use EXPLAIN or EXPLAIN ANALYZE to compare subquery vs. join performance.
Consider CTEs for Complex Nesting: Readability vs. inline views—choose based on team conventions and database support.

6. Real-World Examples

6.1 Identifying Repeat Customers

sql

SELECT customer_id, name
FROM customers c
WHERE (
  SELECT COUNT(*)
  FROM orders o
  WHERE o.customer_id = c.customer_id
) > 5;

6.2 Listing Products Never Ordered

sql

SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
  SELECT 1
  FROM order_items oi
  WHERE oi.product_id = p.product_id
);

6.3 Calculating Year-Over-Year Growth

sql

SELECT
  e.product_id,
  ((current_year.revenue - prev_year.revenue) / prev_year.revenue) * 100 AS yoy_growth
FROM (
  SELECT product_id, SUM(amount) AS revenue
  FROM sales
  WHERE sale_date BETWEEN '2025-01-01' AND '2025-12-31'
  GROUP BY product_id
) AS current_year
JOIN (
  SELECT product_id, SUM(amount) AS revenue
  FROM sales
  WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
  GROUP BY product_id
) AS prev_year
  ON current_year.product_id = prev_year.product_id;

7. Conclusion

Subqueries and derived tables let you decompose complex problems into manageable, reusable components. By mastering scalar and correlated subqueries, leveraging EXISTS/NOT EXISTS for membership checks, and harnessing inline views for clarity, you write cleaner, more maintainable SQL. As you encounter ever more intricate reporting and analysis requirements, these techniques will keep your queries both powerful and readable.

Big Data and AI

Search This Blog

Part II: Retrieving Data Chapter 6: Subqueries and Derived Tables

Chapter 6: Subqueries and Derived Tables

1. Understanding Subqueries

1.1 Scalar Subqueries

1.2 Correlated Subqueries

2. EXISTS and NOT EXISTS for Membership Tests

2.1 EXISTS

2.2 NOT EXISTS

3. Derived Tables (Inline Views)

3.1 Basic Syntax

3.2 Advantages of Derived Tables

4. Nesting Queries for Advanced Filtering

4.1 Filtering on Aggregates

4.2 Multi-Level Nesting

5. Performance and Best Practices

6. Real-World Examples

6.1 Identifying Repeat Customers

6.2 Listing Products Never Ordered

6.3 Calculating Year-Over-Year Growth

7. Conclusion

Labels

Comments

Post a Comment

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

Kickstart Your SQL Journey with Our Step-by-Step Tutorial Series