Skip to main content

Part VI: Advanced SQL Concepts Chapter 15: Performance Tuning & Best Practices

Chapter 15: Performance Tuning & Best Practices



When your SQL queries work correctly but run painfully slow on large tables, it’s time to diagnose, optimize, and monitor. This chapter covers the full lifecycle of performance tuning: reading execution plans, spotting bottlenecks, rewriting queries, tuning indexes, and tracking metrics over time. Whether you manage a handful of gigabytes or petabytes of data, these techniques will elevate your SQL from functional to blazing fast.

1. Reading and Interpreting Execution Plans

Execution plans reveal how the database engine executes your queries. They show join strategies, index usage, sort operations, and estimated vs. actual row counts. Learning to read them is the first step toward optimization.

1.1 Generating an Execution Plan

  • PostgreSQL:

    sql
    EXPLAIN ANALYZE
    SELECT * FROM orders WHERE order_date > '2025-01-01';
    
  • MySQL:

    sql
    EXPLAIN FORMAT=JSON
    SELECT * FROM orders WHERE order_date > '2025-01-01';
    
  • SQL Server:

    sql
    SET STATISTICS PROFILE ON;
    GO
    SELECT * FROM orders WHERE order_date > '2025-01-01';
    GO
    SET STATISTICS PROFILE OFF;
    

1.2 Key Plan Elements

  • Seq Scan / Table Scan: Full scan of all rows.

  • Index Scan: Uses an index to locate rows.

  • Bitmap Index Scan: Combines multiple index conditions into a bitmap.

  • Nested Loop Join: Iterates one row at a time, nested within another set.

  • Hash Join: Builds a hash table on the smaller input, then probes with the larger input.

  • Sort: Orders rows; expensive if not supported by an index.

Compare estimated rows vs. actual rows: large discrepancies often indicate outdated statistics or wrong cardinality estimates, leading to suboptimal plan choices.

2. Identifying Slow Operations

Once you’ve examined the plan, pinpoint the slowest steps:

  • Full Table Scans on large tables without selective filters.

  • Missing Indexes in WHERE, JOIN, GROUP BY, or ORDER BY columns.

  • Costly Sorts when results must be ordered without an index.

  • Nested Loop Joins on large inputs, which degrade as table sizes grow.

  • Wide Data Transfers when SELECT * pulls unnecessary columns.

2.1 Using Profiling Tools

  • PostgreSQL: pg_stat_statements view to find top-consuming queries.

  • MySQL: Performance schema or slow query log (long_query_time).

  • SQL Server: Dynamic Management Views (DMVs) like sys.dm_exec_query_stats.

Collect the top 10 slowest queries by total time and examine their plans.

3. Techniques for Query Rewriting

Rewriting queries can dramatically improve performance without schema changes.

3.1 Push Filters Early

Apply WHERE conditions as close to the data source as possible:

sql
-- Suboptimal: filter after join
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2025-01-01';

-- Optimal: same, but ensure the optimizer pushes order_date filter before join

Most engines auto-push filters, but explicit CTEs or derived tables can block optimizations. Avoid:

sql
WITH tmp AS (
  SELECT * FROM orders
)
SELECT * FROM tmp WHERE order_date > ...

3.2 Replace Correlated Subqueries with Joins

Correlated subqueries can execute per row—replace them with set-based joins:

sql
-- Correlated subquery
SELECT e.*, 
  (SELECT COUNT(*) FROM orders o WHERE o.employee_id = e.id) AS cnt
FROM employees e;

-- Rewrite using join and aggregation
SELECT e.id, e.name, COUNT(o.order_id) AS cnt
FROM employees e
LEFT JOIN orders o ON o.employee_id = e.id
GROUP BY e.id, e.name;

3.3 Limit Data with LIMIT or TOP`

During development, always test with a small sample:

sql
SELECT * FROM huge_table ORDER BY created_at DESC LIMIT 1000;

This speeds up iteration and avoids waiting for full scans.

4. Index Tuning Strategies

Proper indexes turn full scans into targeted lookups, but over-indexing slows writes and wastes space.

4.1 Single-Column vs. Composite Indexes

  • Single-Column: Ideal for queries filtering on one column.

  • Composite: Cover multiple columns in filter or join order:

sql
CREATE INDEX idx_orders_customer_date
  ON orders (customer_id, order_date);

Queries filtering by both customer_id and order_date benefit from this composite index.

4.2 Covering Indexes

A covering index includes all columns needed by a query, removing the need to access the base table:

sql
CREATE INDEX idx_orders_covering
  ON orders (customer_id, status)
  INCLUDE (order_date, total_amount);

In SQL Server, use INCLUDE to add non-key columns to the index leaf.

4.3 Partial and Functional Indexes

  • Partial Index (PostgreSQL):

    sql
    CREATE INDEX idx_active_orders
      ON orders (order_date)
      WHERE status = 'active';
    
  • Functional Index:

    sql
    CREATE INDEX idx_lower_email
      ON customers (LOWER(email));
    

Use these when you query on expressions or a subset of data.

4.4 Index Maintenance

  • VACUUM/ANALYZE (PostgreSQL) or OPTIMIZE TABLE (MySQL) to update statistics and reclaim bloat.

  • REBUILD INDEX on SQL Server to defragment.

  • Schedule maintenance during low-traffic windows.

5. Monitoring Tools and Metrics

Continuous monitoring catches regressions before they impact users.

5.1 Key Metrics to Track

  • Query Latency: average and 95th percentile execution time.

  • Throughput: queries per second.

  • Lock Waits: duration of blocked transactions.

  • Index Usage: ratio of index scans vs. table scans.

  • Resource Utilization: CPU, memory, I/O on the database server.

5.2 Popular Monitoring Solutions

  • pgAdmin/pgwatch2 for PostgreSQL dashboards.

  • Percona Monitoring and Management (PMM) for MySQL/MariaDB.

  • Azure Data Studio or SQL Sentry for SQL Server performance.

  • Grafana + Prometheus with exporters (e.g., postgres_exporter, mysqld_exporter).

Set up alerts on slow-query spikes, high lock contention, or I/O saturation.

6. Putting It All Together

  1. Profile your workload and identify critical queries.

  2. EXPLAIN those queries to understand their plans.

  3. Rewrite or refactor queries—push filters, replace subqueries, limit results.

  4. Tune indexes—add or modify single, composite, covering, partial, or functional indexes.

  5. Maintain indexes and statistics regularly.

  6. Monitor performance metrics continuously and iterate when patterns change.

By adopting this workflow, you’ll transform your SQL environment from reactive troubleshooting to proactive performance engineering.

Comments

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

  Welcome back to the blog! Today we explore the life and legacy of Alfred Marshall (1842–1924) , the British economist who laid the foundations of modern microeconomics . His landmark book, Principles of Economics (1890), introduced core concepts like supply and demand , elasticity , and market equilibrium — ideas that continue to shape how we understand economics today. Who Was Alfred Marshall? Alfred Marshall was a professor at the University of Cambridge and a key figure in the development of neoclassical economics . He believed economics should be rigorous, mathematical, and practical , focusing on real-world issues like prices, wages, and consumer behavior. Marshall also emphasized that economics is ultimately about improving human well-being. Key Contributions 1. Supply and Demand Analysis Marshall was the first to clearly present supply and demand as intersecting curves on a graph. He showed how prices are determined by both what consumers are willing to pay (dem...

Fundamental Analysis Case Study NVIDIA

  Executive summary NVIDIA is analyzed here using the full fundamental framework: balance sheet, income statement, cash flow statement, valuation multiples, sector comparison, sensitivity scenarios, and investment checklist. The company shows exceptional profitability, strong cash generation, conservative liquidity and net cash, and premium valuation multiples justified only if high growth and margin profiles persist. Key investment considerations are growth sustainability in data center and AI, margin durability, geopolitical and supply risks, and valuation sensitivity to execution. The detailed numerical work below uses the exact metrics you provided. Company profile and market context Business model and market position Company NVIDIA Corporation, leader in GPUs, AI accelerators, and related software platforms. Core revenue streams : data center GPUs and systems, gaming GPUs, professional visualization, automotive, software and services. Strategic advantage : GPU architecture, C...

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

  Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...