Chapter 15: Performance Tuning & Best Practices
When your SQL queries work correctly but run painfully slow on large tables, it’s time to diagnose, optimize, and monitor. This chapter covers the full lifecycle of performance tuning: reading execution plans, spotting bottlenecks, rewriting queries, tuning indexes, and tracking metrics over time. Whether you manage a handful of gigabytes or petabytes of data, these techniques will elevate your SQL from functional to blazing fast.
1. Reading and Interpreting Execution Plans
Execution plans reveal how the database engine executes your queries. They show join strategies, index usage, sort operations, and estimated vs. actual row counts. Learning to read them is the first step toward optimization.
1.1 Generating an Execution Plan
PostgreSQL:
sqlEXPLAIN ANALYZE SELECT * FROM orders WHERE order_date > '2025-01-01';MySQL:
sqlEXPLAIN FORMAT=JSON SELECT * FROM orders WHERE order_date > '2025-01-01';SQL Server:
sqlSET STATISTICS PROFILE ON; GO SELECT * FROM orders WHERE order_date > '2025-01-01'; GO SET STATISTICS PROFILE OFF;
1.2 Key Plan Elements
Seq Scan / Table Scan: Full scan of all rows.
Index Scan: Uses an index to locate rows.
Bitmap Index Scan: Combines multiple index conditions into a bitmap.
Nested Loop Join: Iterates one row at a time, nested within another set.
Hash Join: Builds a hash table on the smaller input, then probes with the larger input.
Sort: Orders rows; expensive if not supported by an index.
Compare estimated rows vs. actual rows: large discrepancies often indicate outdated statistics or wrong cardinality estimates, leading to suboptimal plan choices.
2. Identifying Slow Operations
Once you’ve examined the plan, pinpoint the slowest steps:
Full Table Scans on large tables without selective filters.
Missing Indexes in
WHERE,JOIN,GROUP BY, orORDER BYcolumns.Costly Sorts when results must be ordered without an index.
Nested Loop Joins on large inputs, which degrade as table sizes grow.
Wide Data Transfers when
SELECT *pulls unnecessary columns.
2.1 Using Profiling Tools
PostgreSQL:
pg_stat_statementsview to find top-consuming queries.MySQL: Performance schema or slow query log (
long_query_time).SQL Server: Dynamic Management Views (DMVs) like
sys.dm_exec_query_stats.
Collect the top 10 slowest queries by total time and examine their plans.
3. Techniques for Query Rewriting
Rewriting queries can dramatically improve performance without schema changes.
3.1 Push Filters Early
Apply WHERE conditions as close to the data source as possible:
-- Suboptimal: filter after join
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2025-01-01';
-- Optimal: same, but ensure the optimizer pushes order_date filter before join
Most engines auto-push filters, but explicit CTEs or derived tables can block optimizations. Avoid:
WITH tmp AS (
SELECT * FROM orders
)
SELECT * FROM tmp WHERE order_date > ...
3.2 Replace Correlated Subqueries with Joins
Correlated subqueries can execute per row—replace them with set-based joins:
-- Correlated subquery
SELECT e.*,
(SELECT COUNT(*) FROM orders o WHERE o.employee_id = e.id) AS cnt
FROM employees e;
-- Rewrite using join and aggregation
SELECT e.id, e.name, COUNT(o.order_id) AS cnt
FROM employees e
LEFT JOIN orders o ON o.employee_id = e.id
GROUP BY e.id, e.name;
3.3 Limit Data with LIMIT or TOP`
During development, always test with a small sample:
SELECT * FROM huge_table ORDER BY created_at DESC LIMIT 1000;
This speeds up iteration and avoids waiting for full scans.
4. Index Tuning Strategies
Proper indexes turn full scans into targeted lookups, but over-indexing slows writes and wastes space.
4.1 Single-Column vs. Composite Indexes
Single-Column: Ideal for queries filtering on one column.
Composite: Cover multiple columns in filter or join order:
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);
Queries filtering by both customer_id and order_date benefit from this composite index.
4.2 Covering Indexes
A covering index includes all columns needed by a query, removing the need to access the base table:
CREATE INDEX idx_orders_covering
ON orders (customer_id, status)
INCLUDE (order_date, total_amount);
In SQL Server, use INCLUDE to add non-key columns to the index leaf.
4.3 Partial and Functional Indexes
Partial Index (PostgreSQL):
sqlCREATE INDEX idx_active_orders ON orders (order_date) WHERE status = 'active';Functional Index:
sqlCREATE INDEX idx_lower_email ON customers (LOWER(email));
Use these when you query on expressions or a subset of data.
4.4 Index Maintenance
VACUUM/ANALYZE (PostgreSQL) or OPTIMIZE TABLE (MySQL) to update statistics and reclaim bloat.
REBUILD INDEX on SQL Server to defragment.
Schedule maintenance during low-traffic windows.
5. Monitoring Tools and Metrics
Continuous monitoring catches regressions before they impact users.
5.1 Key Metrics to Track
Query Latency: average and 95th percentile execution time.
Throughput: queries per second.
Lock Waits: duration of blocked transactions.
Index Usage: ratio of index scans vs. table scans.
Resource Utilization: CPU, memory, I/O on the database server.
5.2 Popular Monitoring Solutions
pgAdmin/pgwatch2 for PostgreSQL dashboards.
Percona Monitoring and Management (PMM) for MySQL/MariaDB.
Azure Data Studio or SQL Sentry for SQL Server performance.
Grafana + Prometheus with exporters (e.g.,
postgres_exporter,mysqld_exporter).
Set up alerts on slow-query spikes, high lock contention, or I/O saturation.
6. Putting It All Together
Profile your workload and identify critical queries.
EXPLAIN those queries to understand their plans.
Rewrite or refactor queries—push filters, replace subqueries, limit results.
Tune indexes—add or modify single, composite, covering, partial, or functional indexes.
Maintain indexes and statistics regularly.
Monitor performance metrics continuously and iterate when patterns change.
By adopting this workflow, you’ll transform your SQL environment from reactive troubleshooting to proactive performance engineering.

Comments
Post a Comment