Skip to main content

Part V: Designing Your Schema Chapter 12: Keys and Indexes

 

Chapter 12: Keys and Indexes



Designing a robust schema goes beyond defining tables and columns. The right keys and indexes enforce data integrity, prevent anomalies, and turbocharge query performance. In this chapter, you’ll learn how to:

  • Distinguish and implement primary keys and surrogate keys

  • Configure foreign keys to uphold referential integrity

  • Create the most effective indexes for your queries

  • Balance speed gains against storage and maintenance costs

By the end, you’ll know how to architect schemas that both safeguard your data and serve queries at lightning speed.

1. Primary Keys and Surrogate Keys

A table’s primary key uniquely identifies each row. Choosing the right key type sets the foundation for joins, lookups, and integrity.

1.1 Natural vs. Surrogate Keys

  • Natural Key An existing attribute or set of attributes that uniquely describes the entity (e.g., a country’s ISO code, a user’s email). • Pros: No extra column needed; human-readable. • Cons: Subject to business-driven changes; large or composite natural keys bloat indexes and FKs.

  • Surrogate Key An arbitrary, system-generated identifier (e.g., SERIAL, UUID, IDENTITY). • Pros: Immutable, fixed-size, minimal. • Cons: Requires an extra column; not meaningful outside the system.

1.2 Defining Primary Keys

sql
CREATE TABLE products (
  product_id   SERIAL     PRIMARY KEY,
  sku          VARCHAR(20) NOT NULL UNIQUE,
  name         TEXT       NOT NULL
);
  • Always declare one primary key per table.

  • For composite natural keys, use multiple columns:

    sql
    PRIMARY KEY (order_id, product_id)
    

1.3 Best Practices

  • Favor surrogate keys when natural keys are large, composite, or likely to change.

  • Use natural keys when they are stable, compact, and meaningful (e.g., ISO codes).

  • Enforce key constraints at the schema level; avoid ad hoc uniqueness checks in application code.

2. Establishing Foreign Keys for Referential Integrity

Foreign keys link child rows to parent rows and prevent orphaned records.

2.1 Declaring Foreign Keys

sql
CREATE TABLE orders (
  order_id     SERIAL      PRIMARY KEY,
  customer_id  INT         NOT NULL,
  order_date   DATE        NOT NULL,
  CONSTRAINT fk_customer
    FOREIGN KEY (customer_id)
    REFERENCES customers (customer_id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);
  • ON DELETE RESTRICT: block deletion of a referenced parent.

  • ON DELETE CASCADE: automatically remove child rows when the parent is deleted.

  • ON UPDATE: propagate or restrict parent key changes.

2.2 Choosing Delete/Update Actions

ActionDescriptionUse Case
RESTRICT/NO ACTIONPrevent parent deletion if children existCore business data—never lose historical records
CASCADEAutomatically delete or update childrenDependent lookup tables or caches
SET NULLNullify the FK in child rowsSoft associations where absence is valid
SET DEFAULTReset FK to a default valueLegacy systems requiring a fallback reference

2.3 Best Practices

  • Always index foreign key columns to accelerate joins and cascades.

  • Document the chosen referential actions; downstream developers must understand how deletes/updates propagate.

  • Avoid circular FK references; if necessary, break them with deferred constraints or manual cleanup.

3. Creating Indexes to Accelerate Query Performance

Indexes are specialized data structures that allow the database engine to locate rows quickly. Proper indexing can reduce a full table scan to a single index lookup.

3.1 Index Types

  • B-Tree Index (Default) Balanced tree structure for equality and range queries.

  • Hash Index Optimized for simple equality checks (=), unsupported for ranges in most systems.

  • Expression / Functional Index Index on the result of an expression or function (e.g., LOWER(email)).

  • Partial Index Index only a subset of rows matching a WHERE predicate (e.g., status = 'active').

  • Unique Index Enforces uniqueness in addition to speeding lookups (used by UNIQUE constraints).

3.2 Creating Indexes

sql
-- Single-column index
CREATE INDEX idx_orders_date
  ON orders (order_date);

-- Composite index (best when you filter by both columns)
CREATE INDEX idx_orders_customer_date
  ON orders (customer_id, order_date);

-- Functional index
CREATE INDEX idx_customers_lower_email
  ON customers (LOWER(email));

-- Partial index in PostgreSQL
CREATE INDEX idx_active_products
  ON products (category_id)
  WHERE is_active = TRUE;

3.3 Covering Indexes

A covering index includes all columns needed by the query, avoiding a table lookup:

sql
CREATE INDEX idx_invoice_covering
  ON invoice (customer_id, status, total_amount);

A query filtering by customer_id & status and returning total_amount can be satisfied entirely by the index.

4. Understanding Index Trade-Offs and Storage Overhead

While indexes speed reads, they introduce costs:

4.1 Write Performance Impact

  • INSERT / UPDATE / DELETE statements must maintain all affected indexes, adding I/O and CPU overhead.

  • Composite and expression indexes incur larger maintenance penalties.

4.2 Storage Consumption

  • Each index consumes disk space roughly proportional to the number of indexed rows times the key size.

  • Unused or duplicate indexes waste storage and slow DML operations.

4.3 Fragmentation and Maintenance

  • Over time, b-tree pages can become fragmented, reducing performance.

  • Periodically REINDEX, OPTIMIZE, or VACUUM your indexes based on your RDBMS.

5. Indexing Strategies and Monitoring

5.1 When to Index

  • Columns used frequently in WHERE, JOIN, ORDER BY, or GROUP BY clauses.

  • Highly selective columns (low percentage of duplicate values).

  • Foreign key columns for fast parent lookups.

5.2 When to Avoid Indexes

  • Columns with low cardinality (e.g., boolean flags).

  • Very large text columns—consider full-text search indexes instead.

  • Tables with heavy write loads where read patterns don’t justify the overhead.

5.3 Monitoring Index Usage

  • Use EXPLAIN plans to see which indexes your queries employ.

  • Query database catalog views for index hit rates and scan counts (e.g., PostgreSQL’s pg_stat_user_indexes).

  • Drop or adjust indexes that rarely appear in query plans.

6. Designing for Data Integrity and Performance

By combining keys and indexes effectively, you build schemas that are both safe and performant:

  1. Define minimal primary keys—one column when possible.

  2. Enforce referential integrity with FKs and explicit actions.

  3. Index foreign keys to support joins and cascades.

  4. Add indexes on columns central to your queries, but limit their number.

  5. Use composite and covering indexes for multi-column filters and projections.

  6. Regularly audit your indexing strategy and adjust as access patterns evolve.

Comments

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

  Welcome back to the blog! Today we explore the life and legacy of Alfred Marshall (1842–1924) , the British economist who laid the foundations of modern microeconomics . His landmark book, Principles of Economics (1890), introduced core concepts like supply and demand , elasticity , and market equilibrium — ideas that continue to shape how we understand economics today. Who Was Alfred Marshall? Alfred Marshall was a professor at the University of Cambridge and a key figure in the development of neoclassical economics . He believed economics should be rigorous, mathematical, and practical , focusing on real-world issues like prices, wages, and consumer behavior. Marshall also emphasized that economics is ultimately about improving human well-being. Key Contributions 1. Supply and Demand Analysis Marshall was the first to clearly present supply and demand as intersecting curves on a graph. He showed how prices are determined by both what consumers are willing to pay (dem...

Fundamental Analysis Case Study NVIDIA

  Executive summary NVIDIA is analyzed here using the full fundamental framework: balance sheet, income statement, cash flow statement, valuation multiples, sector comparison, sensitivity scenarios, and investment checklist. The company shows exceptional profitability, strong cash generation, conservative liquidity and net cash, and premium valuation multiples justified only if high growth and margin profiles persist. Key investment considerations are growth sustainability in data center and AI, margin durability, geopolitical and supply risks, and valuation sensitivity to execution. The detailed numerical work below uses the exact metrics you provided. Company profile and market context Business model and market position Company NVIDIA Corporation, leader in GPUs, AI accelerators, and related software platforms. Core revenue streams : data center GPUs and systems, gaming GPUs, professional visualization, automotive, software and services. Strategic advantage : GPU architecture, C...

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

  Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...