Scenario
You are a BI analyst for a grocery store chain monitoring dietary trends that influence in‑store purchases. Your company wants you to analyze which types of Hass avocados are purchased most often. Avocados are categorized into four sizes—small, medium, large, and extra large—and each sale includes the average price, total volume, and date.
Using this dataset, you will create a historical table to demonstrate how partitions and clusters work in BigQuery. Your goal is to answer the question:
What is the distribution of avocado sales from 2015 to 2021?
Create a Baseline Table (No Partition, No Cluster)
Start by creating a new table without partitions or clustering. This baseline will help you compare performance later. Name the table avocados.
After running the SQL, your table should look like this:
Create a Partitioned Table
Next, create a table partitioned by an integer range representing the years 2015–2022. Name this table avocados_partitioned.
Return to the SQL editor, delete the previous query, paste the new SQL, and click Run.
Your partitioned table should now appear like this:
Create a Partitioned and Clustered Table
Now create a table that is both partitioned by year and clustered by type. Name this table avocados_clustered.
The resulting clustered table should look like this:
Query the Tables and Compare Performance
Query the Non‑Partitioned Table
Query the Partitioned Table
Query the Partitioned and Clustered Table
Visualize the Results
Number of Avocados Sold per Year
Total Volume per Year
Average Price per Year
Conclusion
By creating baseline, partitioned, and partitioned‑clustered tables, you can clearly observe how BigQuery improves performance through efficient data pruning and clustering. Partitioning reduces scanned data, while clustering improves filtering and sorting within partitions.
This workflow is essential for BI professionals working with large datasets. For more foundational concepts, explore:
With partitions and clusters, you can deliver faster insights, reduce costs, and scale your analytical workflows efficiently.
Comments
Post a Comment