Skip to main content

Part II: Retrieving Data Chapter 6: Subqueries and Derived Tables

 

Chapter 6: Subqueries and Derived Tables



Complex business questions often require breaking problems into smaller, composable parts. SQL subqueries and derived tables—sometimes called inline views—offer a clean way to nest queries, test membership, and encapsulate logic. In this chapter, you’ll learn how to:

  • Write scalar and correlated subqueries

  • Use EXISTS and NOT EXISTS for efficient membership tests

  • Create derived tables for readability and modularity

  • Nest queries to filter data based on other query results

By the end, you’ll tackle multi-step analyses with maintainable, self-documenting SQL.

1. Understanding Subqueries

A subquery is a query enclosed in parentheses that returns data to the outer (parent) query. There are two main flavors:

  • Scalar subqueries return a single value.

  • Correlated subqueries reference columns from the outer query and re-execute per row.

1.1 Scalar Subqueries

Use scalar subqueries when you need a single aggregated or computed value:

sql
SELECT
  o.order_id,
  o.total_amount,
  (
    SELECT AVG(total_amount)
    FROM orders
  ) AS avg_order
FROM orders o
WHERE o.order_date = CURRENT_DATE;

Here, the subquery computes the overall average just once and attaches it to every row. Scalar subqueries must return exactly one column and at most one row; otherwise you’ll get errors.

1.2 Correlated Subqueries

A correlated subquery runs once for each row of the outer query, referencing one or more outer columns:

sql
SELECT
  e.employee_id,
  e.name,
  (
    SELECT COUNT(*)
    FROM orders o
    WHERE o.sales_rep_id = e.employee_id
  ) AS orders_count
FROM employees e;

For each employee, the subquery counts how many orders they handled. While powerful, correlated subqueries can be slower on large tables because they execute repeatedly.

2. EXISTS and NOT EXISTS for Membership Tests

Rather than counting matches, EXISTS and NOT EXISTS test for the presence or absence of related rows. These constructs often outperform IN or correlated aggregations.

2.1 EXISTS

Return rows in the outer query only if the subquery finds at least one match:

sql
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

This lists only customers who have placed orders.

2.2 NOT EXISTS

Exclude outer rows when the subquery returns any results:

sql
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

Here, you get customers with zero orders—ideal for targeting dormant accounts.

3. Derived Tables (Inline Views)

A derived table is a subquery used in the FROM clause, given an alias, and treated like a real table. Derived tables improve readability by encapsulating complex joins or aggregations.

3.1 Basic Syntax

sql
SELECT dt.department, dt.avg_salary
FROM (
  SELECT department_id, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department_id
) AS dt
JOIN departments d
  ON dt.department_id = d.department_id
ORDER BY dt.avg_salary DESC;

The inner query (dt) computes average salaries per department. The outer query enriches this with department names.

3.2 Advantages of Derived Tables

  • Isolation of logic: Aggregations or filters live in one place.

  • Reusability in the same query: You can reference dt multiple times.

  • Clarity: Outer SELECT focuses on business output, not intermediate steps.

4. Nesting Queries for Advanced Filtering

Subqueries and derived tables can be nested to handle multi-step analyses—filter by aggregates, then join or filter again.

4.1 Filtering on Aggregates

You cannot use aggregates directly in WHERE; you use a derived table or CTE:

sql
SELECT o.customer_id, o.order_id, o.total_amount
FROM orders o
JOIN (
  SELECT customer_id, MAX(order_date) AS last_order
  FROM orders
  GROUP BY customer_id
) AS latest
  ON o.customer_id = latest.customer_id
 AND o.order_date = latest.last_order;

This returns each customer’s most recent order.

4.2 Multi-Level Nesting

For highly complex logic, you can nest multiple layers:

sql
SELECT sq.product_id, sq.revenue_rank
FROM (
  SELECT
    product_id,
    RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rank
  FROM (
    SELECT
      oi.product_id,
      SUM(oi.quantity * oi.unit_price) AS total_revenue
    FROM order_items oi
    GROUP BY oi.product_id
  ) AS aggregated
) AS sq
WHERE sq.revenue_rank <= 10;
  • Inner derived table (aggregated) computes total revenue per product.

  • Middle derived table ranks products by revenue.

  • Outer query filters for the top 10 products.

While Common Table Expressions (CTEs) often improve readability, inline derived tables accomplish the same nesting without requiring a separate WITH clause.

5. Performance and Best Practices

  1. Prefer EXISTS for Membership: EXISTS stops at first match; IN may scan all matches.

  2. Avoid Unnecessary Correlated Subqueries: When possible, replace with derived tables or joins.

  3. Limit Derived Table Size: Push filters into the inner query to minimize rows.

  4. Index Join and Filter Columns: Ensure columns used in subquery predicates are indexed.

  5. Test Execution Plans: Use EXPLAIN or EXPLAIN ANALYZE to compare subquery vs. join performance.

  6. Consider CTEs for Complex Nesting: Readability vs. inline views—choose based on team conventions and database support.

6. Real-World Examples

6.1 Identifying Repeat Customers

sql
SELECT customer_id, name
FROM customers c
WHERE (
  SELECT COUNT(*)
  FROM orders o
  WHERE o.customer_id = c.customer_id
) > 5;

6.2 Listing Products Never Ordered

sql
SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
  SELECT 1
  FROM order_items oi
  WHERE oi.product_id = p.product_id
);

6.3 Calculating Year-Over-Year Growth

sql
SELECT
  e.product_id,
  ((current_year.revenue - prev_year.revenue) / prev_year.revenue) * 100 AS yoy_growth
FROM (
  SELECT product_id, SUM(amount) AS revenue
  FROM sales
  WHERE sale_date BETWEEN '2025-01-01' AND '2025-12-31'
  GROUP BY product_id
) AS current_year
JOIN (
  SELECT product_id, SUM(amount) AS revenue
  FROM sales
  WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
  GROUP BY product_id
) AS prev_year
  ON current_year.product_id = prev_year.product_id;

7. Conclusion

Subqueries and derived tables let you decompose complex problems into manageable, reusable components. By mastering scalar and correlated subqueries, leveraging EXISTS/NOT EXISTS for membership checks, and harnessing inline views for clarity, you write cleaner, more maintainable SQL. As you encounter ever more intricate reporting and analysis requirements, these techniques will keep your queries both powerful and readable.

Comments

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

  Welcome back to the blog! Today we explore the life and legacy of Alfred Marshall (1842–1924) , the British economist who laid the foundations of modern microeconomics . His landmark book, Principles of Economics (1890), introduced core concepts like supply and demand , elasticity , and market equilibrium — ideas that continue to shape how we understand economics today. Who Was Alfred Marshall? Alfred Marshall was a professor at the University of Cambridge and a key figure in the development of neoclassical economics . He believed economics should be rigorous, mathematical, and practical , focusing on real-world issues like prices, wages, and consumer behavior. Marshall also emphasized that economics is ultimately about improving human well-being. Key Contributions 1. Supply and Demand Analysis Marshall was the first to clearly present supply and demand as intersecting curves on a graph. He showed how prices are determined by both what consumers are willing to pay (dem...

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

  Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...

Kickstart Your SQL Journey with Our Step-by-Step Tutorial Series

  Welcome to Data Analyst BI! If you’ve ever felt overwhelmed by rows, columns, and cryptic error messages when trying to write your first SQL query, you’re in the right place. Today we’re launching a comprehensive SQL tutorial series crafted specifically for beginners. Whether you’re just starting your data career, pivoting from another field, or simply curious about how analysts slice and dice data, these lessons will guide you from day zero to confident query builder. In each installment, you’ll find clear explanations, annotated examples, and hands-on exercises. By the end of this series, you’ll be able to: Write efficient SQL queries to retrieve and transform data Combine multiple tables to uncover relationships Insert, update, and delete records safely Design robust database schemas with keys and indexes Optimize performance for large datasets Ready to master SQL in a structured, step-by-step way? Let’s explore the full roadmap ahead. Wh...