Skip to main content

BI Project Scenario

Scenario

You are a BI analyst for a grocery store chain monitoring dietary trends that influence in‑store purchases. Your company wants you to analyze which types of Hass avocados are purchased most often. Avocados are categorized into four sizes—small, medium, large, and extra large—and each sale includes the average price, total volume, and date.

Using this dataset, you will create a historical table to demonstrate how partitions and clusters work in BigQuery. Your goal is to answer the question:

What is the distribution of avocado sales from 2015 to 2021?

Create a Baseline Table (No Partition, No Cluster)

Start by creating a new table without partitions or clustering. This baseline will help you compare performance later. Name the table avocados.

BigQuery interface showing SQL editor

After running the SQL, your table should look like this:

BigQuery table preview for baseline avocado table

Create a Partitioned Table

Next, create a table partitioned by an integer range representing the years 2015–2022. Name this table avocados_partitioned.

Return to the SQL editor, delete the previous query, paste the new SQL, and click Run.

BigQuery partition creation example

Your partitioned table should now appear like this:

BigQuery partitioned table preview

Create a Partitioned and Clustered Table

Now create a table that is both partitioned by year and clustered by type. Name this table avocados_clustered.

BigQuery clustered table creation

The resulting clustered table should look like this:

BigQuery clustered table preview

Query the Tables and Compare Performance

Query the Non‑Partitioned Table

Querying non-partitioned table in BigQuery

Query the Partitioned Table

Querying partitioned table in BigQuery

Query the Partitioned and Clustered Table

Querying partitioned and clustered table in BigQuery

Visualize the Results

Number of Avocados Sold per Year

Chart showing number of avocados sold per year

Total Volume per Year

Chart showing total avocado volume per year

Average Price per Year

Chart showing average avocado price per year

Conclusion

By creating baseline, partitioned, and partitioned‑clustered tables, you can clearly observe how BigQuery improves performance through efficient data pruning and clustering. Partitioning reduces scanned data, while clustering improves filtering and sorting within partitions.

This workflow is essential for BI professionals working with large datasets. For more foundational concepts, explore:

With partitions and clusters, you can deliver faster insights, reduce costs, and scale your analytical workflows efficiently.

Comments