Chapter 12: Keys and Indexes
Designing a robust schema goes beyond defining tables and columns. The right keys and indexes enforce data integrity, prevent anomalies, and turbocharge query performance. In this chapter, you’ll learn how to:
Distinguish and implement primary keys and surrogate keys
Configure foreign keys to uphold referential integrity
Create the most effective indexes for your queries
Balance speed gains against storage and maintenance costs
By the end, you’ll know how to architect schemas that both safeguard your data and serve queries at lightning speed.
1. Primary Keys and Surrogate Keys
A table’s primary key uniquely identifies each row. Choosing the right key type sets the foundation for joins, lookups, and integrity.
1.1 Natural vs. Surrogate Keys
Natural Key An existing attribute or set of attributes that uniquely describes the entity (e.g., a country’s ISO code, a user’s email). • Pros: No extra column needed; human-readable. • Cons: Subject to business-driven changes; large or composite natural keys bloat indexes and FKs.
Surrogate Key An arbitrary, system-generated identifier (e.g.,
SERIAL,UUID,IDENTITY). • Pros: Immutable, fixed-size, minimal. • Cons: Requires an extra column; not meaningful outside the system.
1.2 Defining Primary Keys
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
sku VARCHAR(20) NOT NULL UNIQUE,
name TEXT NOT NULL
);
Always declare one primary key per table.
For composite natural keys, use multiple columns:
sqlPRIMARY KEY (order_id, product_id)
1.3 Best Practices
Favor surrogate keys when natural keys are large, composite, or likely to change.
Use natural keys when they are stable, compact, and meaningful (e.g., ISO codes).
Enforce key constraints at the schema level; avoid ad hoc uniqueness checks in application code.
2. Establishing Foreign Keys for Referential Integrity
Foreign keys link child rows to parent rows and prevent orphaned records.
2.1 Declaring Foreign Keys
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
CONSTRAINT fk_customer
FOREIGN KEY (customer_id)
REFERENCES customers (customer_id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);
ON DELETE RESTRICT: block deletion of a referenced parent.
ON DELETE CASCADE: automatically remove child rows when the parent is deleted.
ON UPDATE: propagate or restrict parent key changes.
2.2 Choosing Delete/Update Actions
| Action | Description | Use Case |
|---|---|---|
| RESTRICT/NO ACTION | Prevent parent deletion if children exist | Core business data—never lose historical records |
| CASCADE | Automatically delete or update children | Dependent lookup tables or caches |
| SET NULL | Nullify the FK in child rows | Soft associations where absence is valid |
| SET DEFAULT | Reset FK to a default value | Legacy systems requiring a fallback reference |
2.3 Best Practices
Always index foreign key columns to accelerate joins and cascades.
Document the chosen referential actions; downstream developers must understand how deletes/updates propagate.
Avoid circular FK references; if necessary, break them with deferred constraints or manual cleanup.
3. Creating Indexes to Accelerate Query Performance
Indexes are specialized data structures that allow the database engine to locate rows quickly. Proper indexing can reduce a full table scan to a single index lookup.
3.1 Index Types
B-Tree Index (Default) Balanced tree structure for equality and range queries.
Hash Index Optimized for simple equality checks (
=), unsupported for ranges in most systems.Expression / Functional Index Index on the result of an expression or function (e.g.,
LOWER(email)).Partial Index Index only a subset of rows matching a
WHEREpredicate (e.g.,status = 'active').Unique Index Enforces uniqueness in addition to speeding lookups (used by UNIQUE constraints).
3.2 Creating Indexes
-- Single-column index
CREATE INDEX idx_orders_date
ON orders (order_date);
-- Composite index (best when you filter by both columns)
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);
-- Functional index
CREATE INDEX idx_customers_lower_email
ON customers (LOWER(email));
-- Partial index in PostgreSQL
CREATE INDEX idx_active_products
ON products (category_id)
WHERE is_active = TRUE;
3.3 Covering Indexes
A covering index includes all columns needed by the query, avoiding a table lookup:
CREATE INDEX idx_invoice_covering
ON invoice (customer_id, status, total_amount);
A query filtering by customer_id & status and returning total_amount can be satisfied entirely by the index.
4. Understanding Index Trade-Offs and Storage Overhead
While indexes speed reads, they introduce costs:
4.1 Write Performance Impact
INSERT / UPDATE / DELETE statements must maintain all affected indexes, adding I/O and CPU overhead.
Composite and expression indexes incur larger maintenance penalties.
4.2 Storage Consumption
Each index consumes disk space roughly proportional to the number of indexed rows times the key size.
Unused or duplicate indexes waste storage and slow DML operations.
4.3 Fragmentation and Maintenance
Over time, b-tree pages can become fragmented, reducing performance.
Periodically REINDEX, OPTIMIZE, or VACUUM your indexes based on your RDBMS.
5. Indexing Strategies and Monitoring
5.1 When to Index
Columns used frequently in
WHERE,JOIN,ORDER BY, orGROUP BYclauses.Highly selective columns (low percentage of duplicate values).
Foreign key columns for fast parent lookups.
5.2 When to Avoid Indexes
Columns with low cardinality (e.g., boolean flags).
Very large text columns—consider full-text search indexes instead.
Tables with heavy write loads where read patterns don’t justify the overhead.
5.3 Monitoring Index Usage
Use EXPLAIN plans to see which indexes your queries employ.
Query database catalog views for index hit rates and scan counts (e.g., PostgreSQL’s
pg_stat_user_indexes).Drop or adjust indexes that rarely appear in query plans.
6. Designing for Data Integrity and Performance
By combining keys and indexes effectively, you build schemas that are both safe and performant:
Define minimal primary keys—one column when possible.
Enforce referential integrity with FKs and explicit actions.
Index foreign keys to support joins and cascades.
Add indexes on columns central to your queries, but limit their number.
Use composite and covering indexes for multi-column filters and projections.
Regularly audit your indexing strategy and adjust as access patterns evolve.

Comments
Post a Comment