Skip to main content

Part III: Combining Data Across Tables Chapter 8: Set Operations

 

Chapter 8: Set Operations




Merge query results seamlessly: UNION vs. UNION ALL, INTERSECT, EXCEPT (or MINUS). Perform sophisticated multi-query analyses with ease.

Combining multiple query outputs into a single, coherent result set is a cornerstone of advanced SQL analysis. SQL’s set operations—UNION, UNION ALL, INTERSECT, and EXCEPT (also known as MINUS in some systems)—allow you to treat query results like mathematical sets. Whether you need to deduplicate rows, find overlaps, or subtract one dataset from another, set operations streamline multi-query workflows. This detailed guide covers each operator’s syntax, performance considerations, real-world use cases, and best practices.

1. The Basics of Set Operations

Before diving into each command, ensure your subqueries:

  • Return the same number of columns

  • Use compatible data types in each column position

  • List columns in the same order

SQL set operations follow these set-theory rules:

  • UNION: combine two result sets and remove duplicates

  • UNION ALL: append all rows from both sets, preserving duplicates

  • INTERSECT: return only rows common to both sets

  • EXCEPT / MINUS: return rows in the first set that don’t appear in the second

2. UNION vs. UNION ALL: Merging Distinct or All Rows

FeatureUNIONUNION ALL
DuplicatesEliminatedPreserved
PerformanceSlower (requires sorting)Faster (no deduplication step)
Use caseConsolidate overlapping dataConcatenate non-overlapping data

2.1 Syntax and Examples

sql
-- UNION: distinct orders across two regions
SELECT order_id, customer_id, amount
FROM north_region_orders
UNION
SELECT order_id, customer_id, amount
FROM south_region_orders
ORDER BY order_id;

-- UNION ALL: preserve duplicates for audit
SELECT order_id, customer_id, amount
FROM north_region_orders
UNION ALL
SELECT order_id, customer_id, amount
FROM south_region_orders;

Tips:

  • Add ORDER BY only once, after the final subquery.

  • Index join and filter columns to speed deduplication.

3. INTERSECT: Finding Common Records

INTERSECT returns rows that appear in both subqueries. It behaves like a logical AND.

3.1 Syntax

sql
SELECT customer_email
FROM newsletter_subscribers
INTERSECT
SELECT customer_email
FROM recent_purchasers;

This query lists emails of subscribers who also made recent purchases.

3.2 Use Cases and Performance

  • Cross-list precision: Identify customers present in multiple marketing lists.

  • Data validation: Find records recorded in both transactional and archival tables.

Performance tips:

  • Pre-filter each set with WHERE to reduce row counts.

  • Ensure all columns in the INTERSECT have indexes when possible.

4. EXCEPT (or MINUS): Subtracting One Set from Another

Use EXCEPT (ANSI) or MINUS (Oracle) to return rows in the first result set that do not appear in the second.

4.1 Syntax

sql
-- ANSI SQL
SELECT customer_id
FROM all_customers
EXCEPT
SELECT customer_id
FROM blacklisted_customers;

-- Oracle-specific
SELECT customer_id
FROM all_customers
MINUS
SELECT customer_id
FROM blacklisted_customers;

4.2 Practical Examples

  • Compliance audits: List employees active this month but missing mandatory training.

  • Data cleanup: Identify orphaned child records before deletion.

Vendor notes:

  • MySQL (prior to v8.0) lacks EXCEPT; simulate with LEFT JOIN ... IS NULL or NOT EXISTS.

5. Chaining and Nesting Set Operations

You can chain and nest set operations using parentheses to control precedence. INTERSECT and EXCEPT bind more tightly than UNION, so always use parentheses for complex logic.

sql
(
  SELECT sku FROM january_sales
  INTERSECT
  SELECT sku FROM february_sales
)
EXCEPT
SELECT sku FROM discontinued_products;

This returns SKUs sold in both January and February, excluding discontinued items.

6. Best Practices and Optimization Tips

  1. Align Columns and Types

    • Match column counts and data types exactly.

    • Use CAST when necessary to ensure compatibility.

  2. Filter Early

    • Apply WHERE clauses inside each subquery to shrink data before the set operation.

  3. Choose the Right Operator

    • Use UNION ALL when deduplication isn’t needed—performance boost.

    • Reserve UNION for distinct result requirements.

  4. Handle NULLs Explicitly

    • Two NULL values are considered equal in set operations.

    • Use COALESCE to replace NULL if indeterminate behavior is undesired.

  5. Leverage Indexes

    • Index columns used in each subquery’s WHERE and projection lists.

    • Set operations on indexed subsets run orders of magnitude faster.

  6. Use CTEs for Clarity

    • Wrap each subquery in a Common Table Expression for readable, maintainable code.

sql
WITH jan AS (
  SELECT customer_id FROM sales WHERE month = '2025-01'
),
feb AS (
  SELECT customer_id FROM sales WHERE month = '2025-02'
)
SELECT * FROM jan
INTERSECT
SELECT * FROM feb;

7. Real-World Scenarios

  • Marketing Campaigns:

    • UNION customer lists from web sign-ups and in-store registrations.

    • EXCEPT attendees who unsubscribed.

  • Data Reconciliation:

    • FULL OUTER JOIN or UNION ALL + EXCEPT to find mismatches between ERP and CRM.

  • Temporal Comparisons:

    • Use INTERSECT to identify products listed in both last quarter and current inventory.

8. Conclusion

SQL set operations empower you to merge, intersect, and subtract entire result sets with concise, declarative syntax. By choosing between UNION vs. UNION ALL, leveraging INTERSECT for common records, and using EXCEPT/MINUS to exclude unwanted rows, you can build sophisticated, high-performance multi-query analyses. Follow best practices—filter early, align columns, index wisely, and use CTEs—to ensure your set operations run efficiently as your data scales. Start applying these techniques today to streamline cross-table reporting, data reconciliation, and advanced analytics.

Comments

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

  Welcome back to the blog! Today we explore the life and legacy of Alfred Marshall (1842–1924) , the British economist who laid the foundations of modern microeconomics . His landmark book, Principles of Economics (1890), introduced core concepts like supply and demand , elasticity , and market equilibrium — ideas that continue to shape how we understand economics today. Who Was Alfred Marshall? Alfred Marshall was a professor at the University of Cambridge and a key figure in the development of neoclassical economics . He believed economics should be rigorous, mathematical, and practical , focusing on real-world issues like prices, wages, and consumer behavior. Marshall also emphasized that economics is ultimately about improving human well-being. Key Contributions 1. Supply and Demand Analysis Marshall was the first to clearly present supply and demand as intersecting curves on a graph. He showed how prices are determined by both what consumers are willing to pay (dem...

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

  Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...

Fundamental Analysis Case Study NVIDIA

  Executive summary NVIDIA is analyzed here using the full fundamental framework: balance sheet, income statement, cash flow statement, valuation multiples, sector comparison, sensitivity scenarios, and investment checklist. The company shows exceptional profitability, strong cash generation, conservative liquidity and net cash, and premium valuation multiples justified only if high growth and margin profiles persist. Key investment considerations are growth sustainability in data center and AI, margin durability, geopolitical and supply risks, and valuation sensitivity to execution. The detailed numerical work below uses the exact metrics you provided. Company profile and market context Business model and market position Company NVIDIA Corporation, leader in GPUs, AI accelerators, and related software platforms. Core revenue streams : data center GPUs and systems, gaming GPUs, professional visualization, automotive, software and services. Strategic advantage : GPU architecture, C...