A step-by-step guide to BigQuery partitioning

Bigquery partitioning is an effective way to optimize the performance of your queries in Google’s BigQuery But what exactly is BigQuery, and how can partitioning speed up data analysis?

BigQuery is a fully-managed, serverless enterprise data warehouse offered as part of the Google Cloud Platform. It allows you to run fast SQL queries using the processing power of Google’s infrastructure.

However, as your data grows over time, query performance can slow down. This is where partitioning comes in. Since it divides your large tables into smaller partitions, partitioning can organize your data and accelerate queries. 

Check out this step-by-step guide to BigQuery partitioning to understand how it works and how you can create partitioned tables to optimize your data analytics.

How BigQuery Partitioning Works

In BigQuery, you can break up large tables into smaller, more manageable chunks of data called partitions. The goal is to store related data together in partitions, separated from unrelated data in other partitions. When you query a table, BigQuery only scans the partitions related to that query instead of the entire table.

For example, let’s say you have a table of financial data spanning 10 years, with 10 million rows overall. With BigQuery partitioning, you could divide this table by year into 10 partitions – one for each year.

When you query data from a specific year, BigQuery will retrieve data from only one partition, reducing the amount of scanning required for queries.

As a result, BigQuery will return results faster than it would take to scan the entire table. Since you are charged for the number of bytes processed by each query, you would also be able to save money by not scanning irrelevant data.

How to create partitioned tables in BigQuery

To demonstrate how we can create partitioned tables, we will use a dataset containing information about rising search terms in Google Trends.

BigQuery partitioning 1<br />

I’ll write a SQL query to filter data from the “rising_terms” table and return specific columns based on these criteria: the score is over 99 and date is between 2018-01-01 and 2019-01-01.

Note: if you want to learn how to write basic SQL queries, check out this article on BigQuery syntax.

Here’s what the query would look like:

SELECT term, score, week, rank, country_name

FROM `bigquery-dataset-396013.Google_trends.rising_terms` 

WHERE week BETWEEN ‘2018-01-01’ AND ‘2019-01-01’ 

AND score > 99

LIMIT 1000;

Let’s break it down:

SELECT term, score, week, rank, country_name – these are the columns I want BigQuery to return.

FROM `bigquery-dataset-396013.Google_trends.rising_terms` – this is the table to query the data from.

WHERE week BETWEEN ‘2018-01-01’ AND ‘2019-01-01’ – this clause filters the rows to only those where week is between these two dates.

AND score > 99 – this is an additional filter to only include rows where the score is greater than 99.

LIMIT 1000; – this clause limits the results to a maximum of 1000 rows.

BigQuery partitioning 2

After executing this query, BigQuery returns the columns and rows that fall into meet these criteria. When I ran the query, it had to scan all the rows to return the data that fall under these criteria.

This is where BigQuery partitioning comes in handy. By dividing the table into smaller partitions, BigQuery can scan only the segments that match the criteria I established when I write the query.

In order to partition the table, I’ll use the following syntax:

CREATE TABLE table_name 

PARTITION BY partition_expression

I’ll write a query to create a partitioned table using a SELECT statement to populate it with data from an existing table. This query creates a new table called rising_terms_by_year in the specified dataset. In addition, it specifies that this new table should be partitioned by the week column. It also selects the specified columns from the existing rising_terms table to pull data from.

CREATE TABLE `bigquery-dataset-396013.Google_trends.rising_terms_by_year` 

PARTITION BY week AS

SELECT term, score, week, rank, country_name

FROM `bigquery-dataset-396013.Google_trends.rising_terms`

As you can see, BigQuery has created a partitioned table called “rising_terms_by_year”.

A step-by-step guide to BigQuery partitioning

This is how you create partitioned tables in BigQuery. By creating partitioned tables, you can break up large tables into smaller, more manageable chunks of data. This enables BigQuery to return results more qiuckly and efficiently.

If you want to learn more about BigQuery, check out this article on how to use BigQuery aggregate functions.

Ready to streamline your spreadsheet data?

You may also like…

Best ETL tools in 2024

Best ETL tools in 2024

In the ever-evolving landscape of data analytics and business intelligence, ETL tools play a...