Part IV: Modifying Data – Chapter 9: Inserting Records

Adding new data into your database is as critical as querying it. A solid INSERT strategy prevents downtime, avoids schema breakage, and ensures data accuracy from day one. In this chapter, we’ll cover:

Basic INSERT INTO … VALUES syntax
Bulk inserts using INSERT INTO … SELECT
Best practices for batching large imports
Verifying inserted data before committing

By the end, you’ll have a reliable workflow for populating your tables safely and efficiently.

1. Basic INSERT INTO … VALUES Syntax

The simplest way to add a row is with the INSERT … VALUES statement. Always specify columns explicitly to guard against schema changes.

1.1 Syntax Structure

sql

INSERT INTO table_name (col1, col2, ..., colN)
VALUES (val1, val2, ..., valN);

table_name: target table
(col1, …, colN): list of columns in the insertion order
VALUES: literal values matching each column’s data type

1.2 Example: Single-Row Insert

Imagine an employees table:

sql

CREATE TABLE employees (
  employee_id   SERIAL PRIMARY KEY,
  first_name    VARCHAR(50) NOT NULL,
  last_name     VARCHAR(50) NOT NULL,
  email         VARCHAR(100) UNIQUE,
  hire_date     DATE DEFAULT CURRENT_DATE
);

Insert a new employee record:

sql

INSERT INTO employees (first_name, last_name, email)
VALUES ('Alice', 'Martinez', 'alice.martinez@example.com');

employee_id auto-increments.
hire_date defaults to today’s date.

1.3 Inserting Multiple Rows

Most RDBMS allow batching multiple VALUES in one statement:

sql

INSERT INTO employees (first_name, last_name, email)
VALUES
  ('Bob',   'Smith',    'bob.smith@example.com'),
  ('Carol', 'Johnson',  'carol.johnson@example.com'),
  ('Dave',  'Williams', 'dave.williams@example.com');

This reduces round-trips to the database and speeds up small inserts.

2. Bulk Inserts Using INSERT INTO … SELECT

To load large volumes or copy data from another table, use INSERT … SELECT. This method leverages the database engine’s set-based processing for maximum throughput.

2.1 Syntax Structure

sql

INSERT INTO target_table (col1, col2, …)
SELECT colA, colB, …
FROM source_table
WHERE <filter_conditions>;

2.2 Example: Archiving Old Orders

Suppose you periodically archive orders older than one year:

sql

INSERT INTO orders_archive (order_id, customer_id, order_date, total_amount)
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE order_date < CURRENT_DATE - INTERVAL '1 year';

After confirming the archive, you can delete them from the live table.

2.3 Transforming Data on Insert

You can apply functions and joins within the SELECT to reshape data:

sql

INSERT INTO monthly_revenue (month, total_revenue)
SELECT
  DATE_TRUNC('month', order_date) AS month,
  SUM(total_amount)                AS total_revenue
FROM orders
GROUP BY month;

This statement populates your monthly metrics table in one pass.

3. Best Practices for Batching Large Imports

Loading millions of rows in a single transaction risks long locks and potential rollbacks. Break imports into manageable batches.

3.1 Use Smaller Batches

sql

-- Pseudocode in a procedural loop
FOR batch_start IN 1..total_rows STEP 5000 LOOP
  INSERT INTO target_table (...)
  SELECT ...
  FROM source_table
  WHERE id BETWEEN batch_start AND batch_start + 4999;
  COMMIT;  -- commit each batch
END LOOP;

This approach:

Reduces transaction size.
Prevents excessive lock contention.
Enables partial progress in case of errors.

3.2 Disable Indexes and Constraints Temporarily

For very large loads, consider:

sql

-- PostgreSQL example
ALTER INDEX idx_mytable_col1 DISABLE;
-- perform bulk load…
ALTER INDEX idx_mytable_col1 ENABLE;

Disabling indexes speeds up inserts, but remember to re-enable them and rebuild statistics afterward.

3.3 Leverage Native Bulk Utilities

Many RDBMS offer specialized tools:

PostgreSQL: COPY table_name FROM 'file.csv' DELIMITER ',' CSV HEADER;
MySQL: LOAD DATA INFILE 'file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' IGNORE 1 LINES;
SQL Server: bcp utility or BULK INSERT

These tools bypass SQL parsing overhead and write directly to storage.

4. Verifying Inserted Data Before Committing

Validation ensures data integrity and catches mistakes early.

4.1 Use Explicit Transactions

sql

BEGIN;

INSERT INTO employees (...) VALUES (...);
INSERT INTO employees (...) VALUES (...);

-- Preview the rows you’ve added
SELECT employee_id, first_name, last_name
FROM employees
WHERE email LIKE '%@example.com';

-- If everything looks good:
COMMIT;

-- Otherwise:
-- ROLLBACK;

By manually controlling COMMIT and ROLLBACK, you retain safe checkpoints.

4.2 Leverage RETURNING or OUTPUT Clauses

Capture inserted rows immediately:

PostgreSQL:

sql

INSERT INTO orders (customer_id, order_date, total_amount)
VALUES (123, CURRENT_DATE, 250.00)
RETURNING order_id, total_amount;

SQL Server:

sql

INSERT INTO orders (customer_id, order_date, total_amount)
OUTPUT inserted.order_id, inserted.total_amount
VALUES (123, GETDATE(), 250.00);

This feedback loop confirms exactly what was written.

4.3 Row Counts and Checksums

For bulk loads, compare source and target row counts:

sql

-- After batch insert
SELECT COUNT(*) FROM staging_table;
SELECT COUNT(*) FROM target_table WHERE load_batch = '20250815';

Or compute a checksum on key columns:

sql

SELECT MD5(string_agg(col1 || col2, '')) AS source_checksum
FROM staging_table;

SELECT MD5(string_agg(col1 || col2, '')) AS target_checksum
FROM target_table
WHERE load_batch = '20250815';

Matching checksums give high confidence in data fidelity.

5. Schema Integrity and Error Handling

Insert operations must respect your schema’s constraints and triggers.

5.1 Respect NOT NULL and UNIQUE Constraints

Always supply values for NOT NULL columns or rely on defaults.
Handle potential duplicate keys in upsert scenarios (see Chapter 10 for ON CONFLICT / MERGE).

5.2 Use Parameterized Queries in Applications

Avoid SQL injection and ensure correct data typing:

python

# Example in Python psycopg2
cur.execute(
    "INSERT INTO employees (first_name, last_name, email) VALUES (%s, %s, %s)",
    (first_name, last_name, email)
)

5.3 Log and Monitor Errors

Capture exceptions in your application or stored procedures.
Persist failed rows to an error table for later analysis.

sql

BEGIN
  INSERT INTO orders (...)
  VALUES (...);
EXCEPTION WHEN OTHERS THEN
  INSERT INTO orders_errors (order_data, error_message, created_at)
  VALUES (ROW(...), SQLERRM, CURRENT_TIMESTAMP);
END;

Conclusion

Chapter 9 equips you to add new records confidently:

Master the basic INSERT … VALUES for single or multi-row loads.
Leverage INSERT … SELECT for efficient bulk copying and transformations.
Follow batching best practices—smaller transactions, disabled indexes, native utilities—to scale out large imports.
Verify your inserts with transactions, RETURNING / OUTPUT clauses, and checksum comparisons.
Protect your schema by respecting constraints, using parameterized queries, and logging errors.

By embedding these patterns into your ETL workflows, administrative scripts, and application code, you’ll maintain data integrity, ensure high availability, and prevent schema breakage as your database grows. Now you’re ready to move on to Chapter 10, where we’ll tackle updating and deleting existing records.

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...

Big Data and AI

Search This Blog