Skip to main content

Part IV: Modifying Data – Chapter 9: Inserting Records

 

Part IV: Modifying Data – Chapter 9: Inserting Records



Adding new data into your database is as critical as querying it. A solid INSERT strategy prevents downtime, avoids schema breakage, and ensures data accuracy from day one. In this chapter, we’ll cover:

  • Basic INSERT INTO … VALUES syntax

  • Bulk inserts using INSERT INTO … SELECT

  • Best practices for batching large imports

  • Verifying inserted data before committing

By the end, you’ll have a reliable workflow for populating your tables safely and efficiently.

1. Basic INSERT INTO … VALUES Syntax

The simplest way to add a row is with the INSERT … VALUES statement. Always specify columns explicitly to guard against schema changes.

1.1 Syntax Structure

sql
INSERT INTO table_name (col1, col2, ..., colN)
VALUES (val1, val2, ..., valN);
  • table_name: target table

  • (col1, …, colN): list of columns in the insertion order

  • VALUES: literal values matching each column’s data type

1.2 Example: Single-Row Insert

Imagine an employees table:

sql
CREATE TABLE employees (
  employee_id   SERIAL PRIMARY KEY,
  first_name    VARCHAR(50) NOT NULL,
  last_name     VARCHAR(50) NOT NULL,
  email         VARCHAR(100) UNIQUE,
  hire_date     DATE DEFAULT CURRENT_DATE
);

Insert a new employee record:

sql
INSERT INTO employees (first_name, last_name, email)
VALUES ('Alice', 'Martinez', 'alice.martinez@example.com');
  • employee_id auto-increments.

  • hire_date defaults to today’s date.

1.3 Inserting Multiple Rows

Most RDBMS allow batching multiple VALUES in one statement:

sql
INSERT INTO employees (first_name, last_name, email)
VALUES
  ('Bob',   'Smith',    'bob.smith@example.com'),
  ('Carol', 'Johnson',  'carol.johnson@example.com'),
  ('Dave',  'Williams', 'dave.williams@example.com');

This reduces round-trips to the database and speeds up small inserts.

2. Bulk Inserts Using INSERT INTO … SELECT

To load large volumes or copy data from another table, use INSERT … SELECT. This method leverages the database engine’s set-based processing for maximum throughput.

2.1 Syntax Structure

sql
INSERT INTO target_table (col1, col2, …)
SELECT colA, colB, …
FROM source_table
WHERE <filter_conditions>;

2.2 Example: Archiving Old Orders

Suppose you periodically archive orders older than one year:

sql
INSERT INTO orders_archive (order_id, customer_id, order_date, total_amount)
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE order_date < CURRENT_DATE - INTERVAL '1 year';

After confirming the archive, you can delete them from the live table.

2.3 Transforming Data on Insert

You can apply functions and joins within the SELECT to reshape data:

sql
INSERT INTO monthly_revenue (month, total_revenue)
SELECT
  DATE_TRUNC('month', order_date) AS month,
  SUM(total_amount)                AS total_revenue
FROM orders
GROUP BY month;

This statement populates your monthly metrics table in one pass.

3. Best Practices for Batching Large Imports

Loading millions of rows in a single transaction risks long locks and potential rollbacks. Break imports into manageable batches.

3.1 Use Smaller Batches

sql
-- Pseudocode in a procedural loop
FOR batch_start IN 1..total_rows STEP 5000 LOOP
  INSERT INTO target_table (...)
  SELECT ...
  FROM source_table
  WHERE id BETWEEN batch_start AND batch_start + 4999;
  COMMIT;  -- commit each batch
END LOOP;

This approach:

  • Reduces transaction size.

  • Prevents excessive lock contention.

  • Enables partial progress in case of errors.

3.2 Disable Indexes and Constraints Temporarily

For very large loads, consider:

sql
-- PostgreSQL example
ALTER INDEX idx_mytable_col1 DISABLE;
-- perform bulk load…
ALTER INDEX idx_mytable_col1 ENABLE;

Disabling indexes speeds up inserts, but remember to re-enable them and rebuild statistics afterward.

3.3 Leverage Native Bulk Utilities

Many RDBMS offer specialized tools:

  • PostgreSQL: COPY table_name FROM 'file.csv' DELIMITER ',' CSV HEADER;

  • MySQL: LOAD DATA INFILE 'file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' IGNORE 1 LINES;

  • SQL Server: bcp utility or BULK INSERT

These tools bypass SQL parsing overhead and write directly to storage.

4. Verifying Inserted Data Before Committing

Validation ensures data integrity and catches mistakes early.

4.1 Use Explicit Transactions

sql
BEGIN;

INSERT INTO employees (...) VALUES (...);
INSERT INTO employees (...) VALUES (...);

-- Preview the rows you’ve added
SELECT employee_id, first_name, last_name
FROM employees
WHERE email LIKE '%@example.com';

-- If everything looks good:
COMMIT;

-- Otherwise:
-- ROLLBACK;

By manually controlling COMMIT and ROLLBACK, you retain safe checkpoints.

4.2 Leverage RETURNING or OUTPUT Clauses

Capture inserted rows immediately:

  • PostgreSQL:

    sql
    INSERT INTO orders (customer_id, order_date, total_amount)
    VALUES (123, CURRENT_DATE, 250.00)
    RETURNING order_id, total_amount;
    
  • SQL Server:

    sql
    INSERT INTO orders (customer_id, order_date, total_amount)
    OUTPUT inserted.order_id, inserted.total_amount
    VALUES (123, GETDATE(), 250.00);
    

This feedback loop confirms exactly what was written.

4.3 Row Counts and Checksums

For bulk loads, compare source and target row counts:

sql
-- After batch insert
SELECT COUNT(*) FROM staging_table;
SELECT COUNT(*) FROM target_table WHERE load_batch = '20250815';

Or compute a checksum on key columns:

sql
SELECT MD5(string_agg(col1 || col2, '')) AS source_checksum
FROM staging_table;

SELECT MD5(string_agg(col1 || col2, '')) AS target_checksum
FROM target_table
WHERE load_batch = '20250815';

Matching checksums give high confidence in data fidelity.

5. Schema Integrity and Error Handling

Insert operations must respect your schema’s constraints and triggers.

5.1 Respect NOT NULL and UNIQUE Constraints

  • Always supply values for NOT NULL columns or rely on defaults.

  • Handle potential duplicate keys in upsert scenarios (see Chapter 10 for ON CONFLICT / MERGE).

5.2 Use Parameterized Queries in Applications

Avoid SQL injection and ensure correct data typing:

python
# Example in Python psycopg2
cur.execute(
    "INSERT INTO employees (first_name, last_name, email) VALUES (%s, %s, %s)",
    (first_name, last_name, email)
)

5.3 Log and Monitor Errors

  • Capture exceptions in your application or stored procedures.

  • Persist failed rows to an error table for later analysis.

sql
BEGIN
  INSERT INTO orders (...)
  VALUES (...);
EXCEPTION WHEN OTHERS THEN
  INSERT INTO orders_errors (order_data, error_message, created_at)
  VALUES (ROW(...), SQLERRM, CURRENT_TIMESTAMP);
END;

Conclusion

Chapter 9 equips you to add new records confidently:

  • Master the basic INSERT … VALUES for single or multi-row loads.

  • Leverage INSERT … SELECT for efficient bulk copying and transformations.

  • Follow batching best practices—smaller transactions, disabled indexes, native utilities—to scale out large imports.

  • Verify your inserts with transactions, RETURNING / OUTPUT clauses, and checksum comparisons.

  • Protect your schema by respecting constraints, using parameterized queries, and logging errors.

By embedding these patterns into your ETL workflows, administrative scripts, and application code, you’ll maintain data integrity, ensure high availability, and prevent schema breakage as your database grows. Now you’re ready to move on to Chapter 10, where we’ll tackle updating and deleting existing records.

Comments

Popular posts from this blog

Alfred Marshall – The Father of Modern Microeconomics

  Welcome back to the blog! Today we explore the life and legacy of Alfred Marshall (1842–1924) , the British economist who laid the foundations of modern microeconomics . His landmark book, Principles of Economics (1890), introduced core concepts like supply and demand , elasticity , and market equilibrium — ideas that continue to shape how we understand economics today. Who Was Alfred Marshall? Alfred Marshall was a professor at the University of Cambridge and a key figure in the development of neoclassical economics . He believed economics should be rigorous, mathematical, and practical , focusing on real-world issues like prices, wages, and consumer behavior. Marshall also emphasized that economics is ultimately about improving human well-being. Key Contributions 1. Supply and Demand Analysis Marshall was the first to clearly present supply and demand as intersecting curves on a graph. He showed how prices are determined by both what consumers are willing to pay (dem...

Unlocking South America's Data Potential: Trends, Challenges, and Strategic Opportunities for 2025

  Introduction South America is entering a pivotal phase in its digital and economic transformation. With countries like Brazil, Mexico, and Argentina investing heavily in data infrastructure, analytics, and digital governance, the region presents both challenges and opportunities for professionals working in Business Intelligence (BI), Data Analysis, and IT Project Management. This post explores the key data trends shaping South America in 2025, backed by insights from the World Bank, OECD, and Statista. It’s designed for analysts, project managers, and decision-makers who want to understand the region’s evolving landscape and how to position themselves for impact. 1. Economic Outlook: A Region in Transition According to the World Bank’s Global Economic Prospects 2025 , Latin America is expected to experience slower growth compared to global averages, with GDP expansion constrained by trade tensions and policy uncertainty. Brazil and Mexico remain the largest economies, with proj...

Fundamental Analysis Case Study NVIDIA

  Executive summary NVIDIA is analyzed here using the full fundamental framework: balance sheet, income statement, cash flow statement, valuation multiples, sector comparison, sensitivity scenarios, and investment checklist. The company shows exceptional profitability, strong cash generation, conservative liquidity and net cash, and premium valuation multiples justified only if high growth and margin profiles persist. Key investment considerations are growth sustainability in data center and AI, margin durability, geopolitical and supply risks, and valuation sensitivity to execution. The detailed numerical work below uses the exact metrics you provided. Company profile and market context Business model and market position Company NVIDIA Corporation, leader in GPUs, AI accelerators, and related software platforms. Core revenue streams : data center GPUs and systems, gaming GPUs, professional visualization, automotive, software and services. Strategic advantage : GPU architecture, C...