Part V: Designing Your Schema Chapter 12: Keys and Indexes

Chapter 12: Keys and Indexes

Designing a robust schema goes beyond defining tables and columns. The right keys and indexes enforce data integrity, prevent anomalies, and turbocharge query performance. In this chapter, you’ll learn how to:

Distinguish and implement primary keys and surrogate keys
Configure foreign keys to uphold referential integrity
Create the most effective indexes for your queries
Balance speed gains against storage and maintenance costs

By the end, you’ll know how to architect schemas that both safeguard your data and serve queries at lightning speed.

1. Primary Keys and Surrogate Keys

A table’s primary key uniquely identifies each row. Choosing the right key type sets the foundation for joins, lookups, and integrity.

1.1 Natural vs. Surrogate Keys

Natural Key An existing attribute or set of attributes that uniquely describes the entity (e.g., a country’s ISO code, a user’s email). • Pros: No extra column needed; human-readable. • Cons: Subject to business-driven changes; large or composite natural keys bloat indexes and FKs.
Surrogate Key An arbitrary, system-generated identifier (e.g., SERIAL, UUID, IDENTITY). • Pros: Immutable, fixed-size, minimal. • Cons: Requires an extra column; not meaningful outside the system.

1.2 Defining Primary Keys

sql

CREATE TABLE products (
  product_id   SERIAL     PRIMARY KEY,
  sku          VARCHAR(20) NOT NULL UNIQUE,
  name         TEXT       NOT NULL
);

Always declare one primary key per table.
For composite natural keys, use multiple columns:
sql
```
PRIMARY KEY (order_id, product_id)
```

1.3 Best Practices

Favor surrogate keys when natural keys are large, composite, or likely to change.
Use natural keys when they are stable, compact, and meaningful (e.g., ISO codes).
Enforce key constraints at the schema level; avoid ad hoc uniqueness checks in application code.

2. Establishing Foreign Keys for Referential Integrity

Foreign keys link child rows to parent rows and prevent orphaned records.

2.1 Declaring Foreign Keys

sql

CREATE TABLE orders (
  order_id     SERIAL      PRIMARY KEY,
  customer_id  INT         NOT NULL,
  order_date   DATE        NOT NULL,
  CONSTRAINT fk_customer
    FOREIGN KEY (customer_id)
    REFERENCES customers (customer_id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);

ON DELETE RESTRICT: block deletion of a referenced parent.
ON DELETE CASCADE: automatically remove child rows when the parent is deleted.
ON UPDATE: propagate or restrict parent key changes.

2.2 Choosing Delete/Update Actions

Action	Description	Use Case
RESTRICT/NO ACTION	Prevent parent deletion if children exist	Core business data—never lose historical records
CASCADE	Automatically delete or update children	Dependent lookup tables or caches
SET NULL	Nullify the FK in child rows	Soft associations where absence is valid
SET DEFAULT	Reset FK to a default value	Legacy systems requiring a fallback reference

2.3 Best Practices

Always index foreign key columns to accelerate joins and cascades.
Document the chosen referential actions; downstream developers must understand how deletes/updates propagate.
Avoid circular FK references; if necessary, break them with deferred constraints or manual cleanup.

3. Creating Indexes to Accelerate Query Performance

Indexes are specialized data structures that allow the database engine to locate rows quickly. Proper indexing can reduce a full table scan to a single index lookup.

3.1 Index Types

B-Tree Index (Default) Balanced tree structure for equality and range queries.
Hash Index Optimized for simple equality checks (=), unsupported for ranges in most systems.
Expression / Functional Index Index on the result of an expression or function (e.g., LOWER(email)).
Partial Index Index only a subset of rows matching a WHERE predicate (e.g., status = 'active').
Unique Index Enforces uniqueness in addition to speeding lookups (used by UNIQUE constraints).

3.2 Creating Indexes

sql

-- Single-column index
CREATE INDEX idx_orders_date
  ON orders (order_date);

-- Composite index (best when you filter by both columns)
CREATE INDEX idx_orders_customer_date
  ON orders (customer_id, order_date);

-- Functional index
CREATE INDEX idx_customers_lower_email
  ON customers (LOWER(email));

-- Partial index in PostgreSQL
CREATE INDEX idx_active_products
  ON products (category_id)
  WHERE is_active = TRUE;

3.3 Covering Indexes

A covering index includes all columns needed by the query, avoiding a table lookup:

sql

CREATE INDEX idx_invoice_covering
  ON invoice (customer_id, status, total_amount);

A query filtering by customer_id & status and returning total_amount can be satisfied entirely by the index.

4. Understanding Index Trade-Offs and Storage Overhead

While indexes speed reads, they introduce costs:

4.1 Write Performance Impact

INSERT / UPDATE / DELETE statements must maintain all affected indexes, adding I/O and CPU overhead.
Composite and expression indexes incur larger maintenance penalties.

4.2 Storage Consumption

Each index consumes disk space roughly proportional to the number of indexed rows times the key size.
Unused or duplicate indexes waste storage and slow DML operations.

4.3 Fragmentation and Maintenance

Over time, b-tree pages can become fragmented, reducing performance.
Periodically REINDEX, OPTIMIZE, or VACUUM your indexes based on your RDBMS.

5. Indexing Strategies and Monitoring

5.1 When to Index

Columns used frequently in WHERE, JOIN, ORDER BY, or GROUP BY clauses.
Highly selective columns (low percentage of duplicate values).
Foreign key columns for fast parent lookups.

5.2 When to Avoid Indexes

Columns with low cardinality (e.g., boolean flags).
Very large text columns—consider full-text search indexes instead.
Tables with heavy write loads where read patterns don’t justify the overhead.

5.3 Monitoring Index Usage

Use EXPLAIN plans to see which indexes your queries employ.
Query database catalog views for index hit rates and scan counts (e.g., PostgreSQL’s pg_stat_user_indexes).
Drop or adjust indexes that rarely appear in query plans.

6. Designing for Data Integrity and Performance

By combining keys and indexes effectively, you build schemas that are both safe and performant:

Define minimal primary keys—one column when possible.
Enforce referential integrity with FKs and explicit actions.
Index foreign keys to support joins and cascades.
Add indexes on columns central to your queries, but limit their number.
Use composite and covering indexes for multi-column filters and projections.
Regularly audit your indexing strategy and adjust as access patterns evolve.

Big Data and AI

Search This Blog

Part V: Designing Your Schema Chapter 12: Keys and Indexes

Chapter 12: Keys and Indexes

1. Primary Keys and Surrogate Keys

1.1 Natural vs. Surrogate Keys

1.2 Defining Primary Keys

1.3 Best Practices

2. Establishing Foreign Keys for Referential Integrity

2.1 Declaring Foreign Keys

2.2 Choosing Delete/Update Actions

2.3 Best Practices

3. Creating Indexes to Accelerate Query Performance

3.1 Index Types

3.2 Creating Indexes

3.3 Covering Indexes

4. Understanding Index Trade-Offs and Storage Overhead

4.1 Write Performance Impact

4.2 Storage Consumption

4.3 Fragmentation and Maintenance

5. Indexing Strategies and Monitoring

5.1 When to Index

5.2 When to Avoid Indexes

5.3 Monitoring Index Usage

6. Designing for Data Integrity and Performance

Labels

Comments

Post a Comment

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

Fundamental Analysis Case Study NVIDIA

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025