Part III: Combining Data Across Tables in SQL

In real-world databases, data is rarely confined to a single table. To unlock deeper insights, you need to learn how to merge and compare datasets stored in different tables. In this third installment of our SQL tutorial series on Data Analyst BI, we’ll explore:

JOIN Operations: Linking rows from multiple tables based on related columns
Set Operations: Merging result sets via UNION, INTERSECT, and EXCEPT

Mastering these techniques will enable you to build richer reports, enforce data integrity, and answer complex business questions.

JOIN Operations

JOINs are the cornerstone of combining data across tables. By matching rows on key columns, you can assemble comprehensive views of your data.

1. INNER JOIN

Returns only rows that have matching keys in both tables.

sql

SELECT o.order_id,
       c.customer_name,
       o.order_date
FROM orders o
INNER JOIN customers c
  ON o.customer_id = c.customer_id;

Use INNER JOIN when you need records that exist in both tables—e.g., orders with valid customers.

2. LEFT (OUTER) JOIN

Returns all rows from the left table and matching rows from the right. Non‐matching right‐table values become NULL.

sql

SELECT c.customer_name,
       o.order_id
FROM customers c
LEFT JOIN orders o
  ON c.customer_id = o.customer_id;

Ideal for finding all customers—even those without orders.

3. RIGHT (OUTER) JOIN

Symmetric to LEFT JOIN: all rows from the right table plus any matching left-table rows.

sql

SELECT p.product_id,
       s.supplier_name
FROM products p
RIGHT JOIN suppliers s
  ON p.supplier_id = s.supplier_id;

Use RIGHT JOIN when you want every supplier listed, regardless of whether they supply a product.

4. FULL (OUTER) JOIN

Combines LEFT and RIGHT JOIN: returns every row from both tables, with NULLs where no match exists.

sql

SELECT a.id AS id_a,
       b.id AS id_b
FROM table_a a
FULL OUTER JOIN table_b b
  ON a.id = b.id;

Great for reconciling two datasets and spotting unmatched records on either side.

5. CROSS JOIN

Produces a Cartesian product: every row in table A paired with every row in table B.

sql

SELECT c.customer_name,
       p.product_name
FROM customers c
CROSS JOIN products p;

Use sparingly—only when you need all possible combinations (e.g., creating a calendar of dates × products).

6. SELF JOIN

Joins a table to itself, usually to compare rows within the same table.

sql

SELECT e1.employee_name AS manager,
       e2.employee_name AS subordinate
FROM employees e1
JOIN employees e2
  ON e1.employee_id = e2.manager_id;

Ideal for hierarchical data like organizational charts.

Set Operations

While JOINs combine columns, set operations merge entire result sets. They’re useful when you want to unify, intersect, or exclude data across similar queries.

1. UNION vs. UNION ALL

UNION removes duplicate rows
UNION ALL retains duplicates and is faster

sql

SELECT email FROM newsletter_subscribers
UNION
SELECT email FROM customers;

Merge two email lists without repeats. Use UNION ALL if you need to count duplicates.

2. INTERSECT

Returns only rows common to both queries.

sql

SELECT customer_id FROM high_value_customers
INTERSECT
SELECT customer_id FROM churn_risk_list;

Identify customers who are both high‐value and at risk of churn.

3. EXCEPT (or MINUS)

Subtracts the second result set from the first. In some databases, use MINUS instead of EXCEPT.

sql

SELECT employee_id FROM all_employees
EXCEPT
SELECT employee_id FROM terminated_employees;

List active employees by removing terminated ones.

Best Practices and Performance Tips

Always specify column lists instead of SELECT * in JOINs to reduce I/O.
Index the columns used in ON clauses and set operations for faster joins and deduplication.
For large datasets, prefer UNION ALL over UNION when deduplication isn’t needed.
When multiple joins are required, join smaller tables first to minimize intermediate result sizes.

Combining tables with JOIN operations and set operations transforms isolated datasets into strategic assets. In the next module, we’ll cover data modification—INSERT, UPDATE, and DELETE—so you can not only read but also write and maintain your databases. Stay tuned!

Big Data and AI

Search This Blog