BigQuery joins: a step-by-step guide

When you use BigQuery joins, you can combine data from multiple tables. By joining tables together, you can query data based on relationships and correlations across datasets.

However, it’s important to mention that there are a few types of joins in BigQuery SQL: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. Each join type enables you to match rows between tables differently.

In this article, you’ll find a step-by-step tutorial on how to use BigQuery joins to combine data from multiple tables. 

Note: If you don’t know how to write basic SQL queries yet, check out this tutorial on how to use BigQuery SQL syntax before reading this article.

A guide to SQL Joins in BigQuery

Before we start writing queries using BigQuery joins, we need to understand how these four types of joins work and how you can use them to merge data in BigQuery.

INNER JOIN: the inner join matches rows between two tables and returns only the matching rows. Therefore, when you combine data, it will leave out all the unmatched rows.

LEFT JOIN: a left join matches all rows from the left table even if no matching row exists in the right table. Here, BigQuery will return all the rows from the left table and the matched rows from the right table.

RIGHT JOIN: it is the opposite of a left join. It matches all rows from the right table even if no matching row exists in the left table.

FULL JOIN: a full join combines left and right joins. All rows from both tables are returned regardless of whether they match.

Join types explained

Here’s a visual representation of these four join types.

BigQuery Joins 1

As you can see, the inner join will return only the matching rows, a left join will return all the rows from the left table, a right join will return all the rows from the right table, and a full join will return all the rows from both tables.

Here’s an example:

BigQuery Joins 2

Let’s say I have these two tables containing information about NCAA basketball teams. While these tables share the “Name” column, they also contain some different items. I’ll combine the data from these two tables, joining the shared “Name” column.

By using INNER JOIN, BigQuery would return only the matching rows. Here’s what the result would look like.

BigQuery Joins 3

As you can see, BigQuery returned the matching rows but left out all the unmatched rows.

If I use LEFT join, BigQuery will return all the rows from the left table. As a result, it will leave out all the unmatched rows from the right table.

BigQuery Joins 4

By using RIGHT JOIN, BigQuery will return all the rows from the right table instead, leaving out the unmatched rows from the left table.

BigQuery Joins 5

Lastly, FULL JOIN will return all rows from both tables, including unmatched rows on either side. 

BigQuery Joins 6

In all these cases, BigQuery would return the word “null” to fill in for all the missing matches.

Now that you know the differences between these types of join, we’ll learn how to combine data using these BigQuery joins.

INNER JOIN

First, I’ll use the INNER JOIN keyword because I want BigQuery to return only matching rows from two tables. Here’s the syntax I’ll use:

INNER JOIN Syntax

SELECT column_name(s)

FROM table1

INNER JOIN table2

ON table1.column_name = table2.column_name;

I’ll show you how to use BigQuery joins to combine data from these NCAA basketball datasets. While one of the tables contains information about NCAA teams (mbb_teams), the other table contains data about these teams’ mascots.

SELECT * 

FROM `bigquery-public-data.ncaa_basketball.mascots` AS mascots

INNER JOIN `bigquery-public-data.ncaa_basketball.mbb_teams` AS teams

ON mascots.name = teams.name;

Let’s break it down:

1) First, I want to select all columns from the results. I’ll use the SELECT * clause.

2) The FROM clause specifies the first table in the join. I’ve used an alias (AS) to give this table a temporary name (“mascots”).This enables me to use the alias later rather than write the full table name.

3) The INNER JOIN clause specifies the second table. I’ve also used an alias to name this table.

4) The ON statement is the condition matching the columns between the two tables. Instead of writing the table name, I used the aliases followed by the column: “name”.

BigQuery Joins 7

LEFT JOIN

After learning how to use the INNER JOIN statement, you can easily write a LEFT JOIN query by following the same pattern.

LEFT JOIN Syntax

SELECT column_name(s)

FROM table1

LEFT JOIN table2

ON table1.column_name = table2.column_name;

Here’s the complete formula.

SELECT * 

FROM `bigquery-public-data.ncaa_basketball.mascots` AS mascots

LEFT JOIN `bigquery-public-data.ncaa_basketball.mbb_teams` AS teams

ON mascots.name = teams.name;

BigQuery Joins 8

SELECT column_name(s)

FROM table1

RIGHT JOIN table2

ON table1.column_name = table2.column_name;

Here’s what the query would look like.

SELECT * 

FROM `bigquery-public-data.ncaa_basketball.mascots` AS mascots

RIGHT JOIN `bigquery-public-data.ncaa_basketball.mbb_teams` AS teams

ON mascots.name = teams.name;

BigQuery Joins 9

FULL JOIN

Lastly, we can use FULL JOIN to write a query to return all rows from both tables, regardless of whether they match. Here’s the FULL JOIN syntax.

FULL JOIN Syntax

SELECT column_name(s)

FROM table1

FULL JOIN table2

ON table1.column_name = table2.column_name;

That’s what the full query would look like.

SELECT * 

FROM `bigquery-public-data.ncaa_basketball.mascots` AS mascots

FULL JOIN `bigquery-public-data.ncaa_basketball.mbb_teams` AS teams

ON mascots.name = teams.name;

BigQuery Joins 10

Keep in mind that you can combine JOIN clauses with other statements. For example, I could combine these two tables, and use WHERE to filter specific data. Let’s say I want BigQuery to return only mascots starting with the letter “A”.

SELECT * 

FROM `bigquery-public-data.ncaa_basketball.mascots` AS mascots

FULL JOIN `bigquery-public-data.ncaa_basketball.mbb_teams` AS teams

ON mascots.name = teams.name

WHERE mascot LIKE ‘A%’;

BigQuery joins: a step-by-step guide

There you have it! That’s how you can use BigQuery joins to combine data from multiple tables. If you want to learn more about BigQuery, check out this article on how to use BigQuery SQL syntax to write your first queries.

Ready to streamline your spreadsheet data?

You may also like…

Best ETL tools in 2024

Best ETL tools in 2024

In the ever-evolving landscape of data analytics and business intelligence, ETL tools play a...