In real-world databases, data is rarely confined to a single table. To unlock deeper insights, you need to learn how to merge and compare datasets stored in different tables. In this third installment of our SQL tutorial series on Data Analyst BI, we’ll explore:
JOIN Operations: Linking rows from multiple tables based on related columns
Set Operations: Merging result sets via UNION, INTERSECT, and EXCEPT
Mastering these techniques will enable you to build richer reports, enforce data integrity, and answer complex business questions.
JOIN Operations
JOINs are the cornerstone of combining data across tables. By matching rows on key columns, you can assemble comprehensive views of your data.
1. INNER JOIN
Returns only rows that have matching keys in both tables.
SELECT o.order_id,
c.customer_name,
o.order_date
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id;
Use INNER JOIN when you need records that exist in both tables—e.g., orders with valid customers.
2. LEFT (OUTER) JOIN
Returns all rows from the left table and matching rows from the right. Non‐matching right‐table values become NULL.
SELECT c.customer_name,
o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id;
Ideal for finding all customers—even those without orders.
3. RIGHT (OUTER) JOIN
Symmetric to LEFT JOIN: all rows from the right table plus any matching left-table rows.
SELECT p.product_id,
s.supplier_name
FROM products p
RIGHT JOIN suppliers s
ON p.supplier_id = s.supplier_id;
Use RIGHT JOIN when you want every supplier listed, regardless of whether they supply a product.
4. FULL (OUTER) JOIN
Combines LEFT and RIGHT JOIN: returns every row from both tables, with NULLs where no match exists.
SELECT a.id AS id_a,
b.id AS id_b
FROM table_a a
FULL OUTER JOIN table_b b
ON a.id = b.id;
Great for reconciling two datasets and spotting unmatched records on either side.
5. CROSS JOIN
Produces a Cartesian product: every row in table A paired with every row in table B.
SELECT c.customer_name,
p.product_name
FROM customers c
CROSS JOIN products p;
Use sparingly—only when you need all possible combinations (e.g., creating a calendar of dates × products).
6. SELF JOIN
Joins a table to itself, usually to compare rows within the same table.
SELECT e1.employee_name AS manager,
e2.employee_name AS subordinate
FROM employees e1
JOIN employees e2
ON e1.employee_id = e2.manager_id;
Ideal for hierarchical data like organizational charts.
Set Operations
While JOINs combine columns, set operations merge entire result sets. They’re useful when you want to unify, intersect, or exclude data across similar queries.
1. UNION vs. UNION ALL
UNION removes duplicate rows
UNION ALL retains duplicates and is faster
SELECT email FROM newsletter_subscribers
UNION
SELECT email FROM customers;
Merge two email lists without repeats. Use UNION ALL if you need to count duplicates.
2. INTERSECT
Returns only rows common to both queries.
SELECT customer_id FROM high_value_customers
INTERSECT
SELECT customer_id FROM churn_risk_list;
Identify customers who are both high‐value and at risk of churn.
3. EXCEPT (or MINUS)
Subtracts the second result set from the first. In some databases, use MINUS instead of EXCEPT.
SELECT employee_id FROM all_employees
EXCEPT
SELECT employee_id FROM terminated_employees;
List active employees by removing terminated ones.
Best Practices and Performance Tips
Always specify column lists instead of
SELECT *in JOINs to reduce I/O.Index the columns used in
ONclauses and set operations for faster joins and deduplication.For large datasets, prefer
UNION ALLoverUNIONwhen deduplication isn’t needed.When multiple joins are required, join smaller tables first to minimize intermediate result sizes.
Combining tables with JOIN operations and set operations transforms isolated datasets into strategic assets. In the next module, we’ll cover data modification—INSERT, UPDATE, and DELETE—so you can not only read but also write and maintain your databases. Stay tuned!

Comments
Post a Comment