Part IV: Modifying Data – Chapter 9: Inserting Records
Adding new data into your database is as critical as querying it. A solid INSERT strategy prevents downtime, avoids schema breakage, and ensures data accuracy from day one. In this chapter, we’ll cover:
Basic INSERT INTO … VALUES syntax
Bulk inserts using INSERT INTO … SELECT
Best practices for batching large imports
Verifying inserted data before committing
By the end, you’ll have a reliable workflow for populating your tables safely and efficiently.
1. Basic INSERT INTO … VALUES Syntax
The simplest way to add a row is with the INSERT … VALUES statement. Always specify columns explicitly to guard against schema changes.
1.1 Syntax Structure
INSERT INTO table_name (col1, col2, ..., colN)
VALUES (val1, val2, ..., valN);
table_name: target table
(col1, …, colN): list of columns in the insertion order
VALUES: literal values matching each column’s data type
1.2 Example: Single-Row Insert
Imagine an employees table:
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
hire_date DATE DEFAULT CURRENT_DATE
);
Insert a new employee record:
INSERT INTO employees (first_name, last_name, email)
VALUES ('Alice', 'Martinez', 'alice.martinez@example.com');
employee_idauto-increments.hire_datedefaults to today’s date.
1.3 Inserting Multiple Rows
Most RDBMS allow batching multiple VALUES in one statement:
INSERT INTO employees (first_name, last_name, email)
VALUES
('Bob', 'Smith', 'bob.smith@example.com'),
('Carol', 'Johnson', 'carol.johnson@example.com'),
('Dave', 'Williams', 'dave.williams@example.com');
This reduces round-trips to the database and speeds up small inserts.
2. Bulk Inserts Using INSERT INTO … SELECT
To load large volumes or copy data from another table, use INSERT … SELECT. This method leverages the database engine’s set-based processing for maximum throughput.
2.1 Syntax Structure
INSERT INTO target_table (col1, col2, …)
SELECT colA, colB, …
FROM source_table
WHERE <filter_conditions>;
2.2 Example: Archiving Old Orders
Suppose you periodically archive orders older than one year:
INSERT INTO orders_archive (order_id, customer_id, order_date, total_amount)
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE order_date < CURRENT_DATE - INTERVAL '1 year';
After confirming the archive, you can delete them from the live table.
2.3 Transforming Data on Insert
You can apply functions and joins within the SELECT to reshape data:
INSERT INTO monthly_revenue (month, total_revenue)
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY month;
This statement populates your monthly metrics table in one pass.
3. Best Practices for Batching Large Imports
Loading millions of rows in a single transaction risks long locks and potential rollbacks. Break imports into manageable batches.
3.1 Use Smaller Batches
-- Pseudocode in a procedural loop
FOR batch_start IN 1..total_rows STEP 5000 LOOP
INSERT INTO target_table (...)
SELECT ...
FROM source_table
WHERE id BETWEEN batch_start AND batch_start + 4999;
COMMIT; -- commit each batch
END LOOP;
This approach:
Reduces transaction size.
Prevents excessive lock contention.
Enables partial progress in case of errors.
3.2 Disable Indexes and Constraints Temporarily
For very large loads, consider:
-- PostgreSQL example
ALTER INDEX idx_mytable_col1 DISABLE;
-- perform bulk load…
ALTER INDEX idx_mytable_col1 ENABLE;
Disabling indexes speeds up inserts, but remember to re-enable them and rebuild statistics afterward.
3.3 Leverage Native Bulk Utilities
Many RDBMS offer specialized tools:
PostgreSQL:
COPY table_name FROM 'file.csv' DELIMITER ',' CSV HEADER;MySQL:
LOAD DATA INFILE 'file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' IGNORE 1 LINES;SQL Server:
bcputility orBULK INSERT
These tools bypass SQL parsing overhead and write directly to storage.
4. Verifying Inserted Data Before Committing
Validation ensures data integrity and catches mistakes early.
4.1 Use Explicit Transactions
BEGIN;
INSERT INTO employees (...) VALUES (...);
INSERT INTO employees (...) VALUES (...);
-- Preview the rows you’ve added
SELECT employee_id, first_name, last_name
FROM employees
WHERE email LIKE '%@example.com';
-- If everything looks good:
COMMIT;
-- Otherwise:
-- ROLLBACK;
By manually controlling COMMIT and ROLLBACK, you retain safe checkpoints.
4.2 Leverage RETURNING or OUTPUT Clauses
Capture inserted rows immediately:
PostgreSQL:
sqlINSERT INTO orders (customer_id, order_date, total_amount) VALUES (123, CURRENT_DATE, 250.00) RETURNING order_id, total_amount;SQL Server:
sqlINSERT INTO orders (customer_id, order_date, total_amount) OUTPUT inserted.order_id, inserted.total_amount VALUES (123, GETDATE(), 250.00);
This feedback loop confirms exactly what was written.
4.3 Row Counts and Checksums
For bulk loads, compare source and target row counts:
-- After batch insert
SELECT COUNT(*) FROM staging_table;
SELECT COUNT(*) FROM target_table WHERE load_batch = '20250815';
Or compute a checksum on key columns:
SELECT MD5(string_agg(col1 || col2, '')) AS source_checksum
FROM staging_table;
SELECT MD5(string_agg(col1 || col2, '')) AS target_checksum
FROM target_table
WHERE load_batch = '20250815';
Matching checksums give high confidence in data fidelity.
5. Schema Integrity and Error Handling
Insert operations must respect your schema’s constraints and triggers.
5.1 Respect NOT NULL and UNIQUE Constraints
Always supply values for
NOT NULLcolumns or rely on defaults.Handle potential duplicate keys in upsert scenarios (see Chapter 10 for
ON CONFLICT/MERGE).
5.2 Use Parameterized Queries in Applications
Avoid SQL injection and ensure correct data typing:
# Example in Python psycopg2
cur.execute(
"INSERT INTO employees (first_name, last_name, email) VALUES (%s, %s, %s)",
(first_name, last_name, email)
)
5.3 Log and Monitor Errors
Capture exceptions in your application or stored procedures.
Persist failed rows to an error table for later analysis.
BEGIN
INSERT INTO orders (...)
VALUES (...);
EXCEPTION WHEN OTHERS THEN
INSERT INTO orders_errors (order_data, error_message, created_at)
VALUES (ROW(...), SQLERRM, CURRENT_TIMESTAMP);
END;
Conclusion
Chapter 9 equips you to add new records confidently:
Master the basic
INSERT … VALUESfor single or multi-row loads.Leverage
INSERT … SELECTfor efficient bulk copying and transformations.Follow batching best practices—smaller transactions, disabled indexes, native utilities—to scale out large imports.
Verify your inserts with transactions,
RETURNING/OUTPUTclauses, and checksum comparisons.Protect your schema by respecting constraints, using parameterized queries, and logging errors.
By embedding these patterns into your ETL workflows, administrative scripts, and application code, you’ll maintain data integrity, ensure high availability, and prevent schema breakage as your database grows. Now you’re ready to move on to Chapter 10, where we’ll tackle updating and deleting existing records.

Comments
Post a Comment