Part VI: Advanced SQL Concepts Chapter 15: Performance Tuning & Best Practices

Chapter 15: Performance Tuning & Best Practices

When your SQL queries work correctly but run painfully slow on large tables, it’s time to diagnose, optimize, and monitor. This chapter covers the full lifecycle of performance tuning: reading execution plans, spotting bottlenecks, rewriting queries, tuning indexes, and tracking metrics over time. Whether you manage a handful of gigabytes or petabytes of data, these techniques will elevate your SQL from functional to blazing fast.

1. Reading and Interpreting Execution Plans

Execution plans reveal how the database engine executes your queries. They show join strategies, index usage, sort operations, and estimated vs. actual row counts. Learning to read them is the first step toward optimization.

1.1 Generating an Execution Plan

PostgreSQL:

sql

EXPLAIN ANALYZE
SELECT * FROM orders WHERE order_date > '2025-01-01';

MySQL:

sql

EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE order_date > '2025-01-01';

SQL Server:

sql

SET STATISTICS PROFILE ON;
GO
SELECT * FROM orders WHERE order_date > '2025-01-01';
GO
SET STATISTICS PROFILE OFF;

1.2 Key Plan Elements

Seq Scan / Table Scan: Full scan of all rows.
Index Scan: Uses an index to locate rows.
Bitmap Index Scan: Combines multiple index conditions into a bitmap.
Nested Loop Join: Iterates one row at a time, nested within another set.
Hash Join: Builds a hash table on the smaller input, then probes with the larger input.
Sort: Orders rows; expensive if not supported by an index.

Compare estimated rows vs. actual rows: large discrepancies often indicate outdated statistics or wrong cardinality estimates, leading to suboptimal plan choices.

2. Identifying Slow Operations

Once you’ve examined the plan, pinpoint the slowest steps:

Full Table Scans on large tables without selective filters.
Missing Indexes in WHERE, JOIN, GROUP BY, or ORDER BY columns.
Costly Sorts when results must be ordered without an index.
Nested Loop Joins on large inputs, which degrade as table sizes grow.
Wide Data Transfers when SELECT * pulls unnecessary columns.

2.1 Using Profiling Tools

PostgreSQL: pg_stat_statements view to find top-consuming queries.
MySQL: Performance schema or slow query log (long_query_time).
SQL Server: Dynamic Management Views (DMVs) like sys.dm_exec_query_stats.

Collect the top 10 slowest queries by total time and examine their plans.

3. Techniques for Query Rewriting

Rewriting queries can dramatically improve performance without schema changes.

3.1 Push Filters Early

Apply WHERE conditions as close to the data source as possible:

sql

-- Suboptimal: filter after join
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2025-01-01';

-- Optimal: same, but ensure the optimizer pushes order_date filter before join

Most engines auto-push filters, but explicit CTEs or derived tables can block optimizations. Avoid:

sql

WITH tmp AS (
  SELECT * FROM orders
)
SELECT * FROM tmp WHERE order_date > ...

3.2 Replace Correlated Subqueries with Joins

Correlated subqueries can execute per row—replace them with set-based joins:

sql

-- Correlated subquery
SELECT e.*, 
  (SELECT COUNT(*) FROM orders o WHERE o.employee_id = e.id) AS cnt
FROM employees e;

-- Rewrite using join and aggregation
SELECT e.id, e.name, COUNT(o.order_id) AS cnt
FROM employees e
LEFT JOIN orders o ON o.employee_id = e.id
GROUP BY e.id, e.name;

3.3 Limit Data with `LIMIT` or TOP`

During development, always test with a small sample:

sql

SELECT * FROM huge_table ORDER BY created_at DESC LIMIT 1000;

This speeds up iteration and avoids waiting for full scans.

4. Index Tuning Strategies

Proper indexes turn full scans into targeted lookups, but over-indexing slows writes and wastes space.

4.1 Single-Column vs. Composite Indexes

Single-Column: Ideal for queries filtering on one column.
Composite: Cover multiple columns in filter or join order:

sql

CREATE INDEX idx_orders_customer_date
  ON orders (customer_id, order_date);

Queries filtering by both customer_id and order_date benefit from this composite index.

4.2 Covering Indexes

A covering index includes all columns needed by a query, removing the need to access the base table:

sql

CREATE INDEX idx_orders_covering
  ON orders (customer_id, status)
  INCLUDE (order_date, total_amount);

In SQL Server, use INCLUDE to add non-key columns to the index leaf.

4.3 Partial and Functional Indexes

Partial Index (PostgreSQL):

sql

CREATE INDEX idx_active_orders
  ON orders (order_date)
  WHERE status = 'active';

Functional Index:

sql

CREATE INDEX idx_lower_email
  ON customers (LOWER(email));

Use these when you query on expressions or a subset of data.

4.4 Index Maintenance

VACUUM/ANALYZE (PostgreSQL) or OPTIMIZE TABLE (MySQL) to update statistics and reclaim bloat.
REBUILD INDEX on SQL Server to defragment.
Schedule maintenance during low-traffic windows.

5. Monitoring Tools and Metrics

Continuous monitoring catches regressions before they impact users.

5.1 Key Metrics to Track

Query Latency: average and 95th percentile execution time.
Throughput: queries per second.
Lock Waits: duration of blocked transactions.
Index Usage: ratio of index scans vs. table scans.
Resource Utilization: CPU, memory, I/O on the database server.

5.2 Popular Monitoring Solutions

pgAdmin/pgwatch2 for PostgreSQL dashboards.
Percona Monitoring and Management (PMM) for MySQL/MariaDB.
Azure Data Studio or SQL Sentry for SQL Server performance.
Grafana + Prometheus with exporters (e.g., postgres_exporter, mysqld_exporter).

Set up alerts on slow-query spikes, high lock contention, or I/O saturation.

6. Putting It All Together

Profile your workload and identify critical queries.
EXPLAIN those queries to understand their plans.
Rewrite or refactor queries—push filters, replace subqueries, limit results.
Tune indexes—add or modify single, composite, covering, partial, or functional indexes.
Maintain indexes and statistics regularly.
Monitor performance metrics continuously and iterate when patterns change.

By adopting this workflow, you’ll transform your SQL environment from reactive troubleshooting to proactive performance engineering.

Big Data and AI

Search This Blog

Part VI: Advanced SQL Concepts Chapter 15: Performance Tuning & Best Practices

Chapter 15: Performance Tuning & Best Practices

1. Reading and Interpreting Execution Plans

1.1 Generating an Execution Plan

1.2 Key Plan Elements

2. Identifying Slow Operations

2.1 Using Profiling Tools

3. Techniques for Query Rewriting

3.1 Push Filters Early

3.2 Replace Correlated Subqueries with Joins

3.3 Limit Data with `LIMIT` or TOP`

4. Index Tuning Strategies

4.1 Single-Column vs. Composite Indexes

4.2 Covering Indexes

4.3 Partial and Functional Indexes

4.4 Index Maintenance

5. Monitoring Tools and Metrics

5.1 Key Metrics to Track

5.2 Popular Monitoring Solutions

6. Putting It All Together

Labels

Comments

Post a Comment

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

Fundamental Analysis Case Study NVIDIA

Behavioral Portfolio Theory (BPT) – Rethinking Investor Behavior and Portfolio Construction

Big Data and AI

Part VI: Advanced SQL Concepts Chapter 15: Performance Tuning & Best Practices

Chapter 15: Performance Tuning & Best Practices

1. Reading and Interpreting Execution Plans

1.1 Generating an Execution Plan

1.2 Key Plan Elements

2. Identifying Slow Operations

2.1 Using Profiling Tools

3. Techniques for Query Rewriting

3.1 Push Filters Early

3.2 Replace Correlated Subqueries with Joins

3.3 Limit Data with LIMIT or TOP`

4. Index Tuning Strategies

4.1 Single-Column vs. Composite Indexes

4.2 Covering Indexes

4.3 Partial and Functional Indexes

4.4 Index Maintenance

5. Monitoring Tools and Metrics

5.1 Key Metrics to Track

5.2 Popular Monitoring Solutions

6. Putting It All Together

Labels

Comments

Post a Comment

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

Fundamental Analysis Case Study NVIDIA

Behavioral Portfolio Theory (BPT) – Rethinking Investor Behavior and Portfolio Construction

3.3 Limit Data with `LIMIT` or TOP`