Analyzing data entails going through various statistical measures to understand the underlying nature and trends. Probably, one of the most commonly used statistical methods is Correlation. We use it to gauge the extent of relationship, dependence or closeness of two variables with respect to each other. And the metric used to calculate this dependence is called correlation coefficient. Given the two sets of variable data, we can calculate the Pearson product-moment correlation coefficient (r) using the CORREL formula in Google Sheets.
It is worth noting that the correlation coefficient r ranges from −1 to 1. When r = 1, it indicates the two variables are in a perfect linear relationship. And when r = 0, it implies that there is no correlation between the variables. A value of −1 implies that the variables are in a negative relationship i.e. if x increases, then the value of y decreases and vice versa. If you are interested in reading through some literature as to how we usually calculate the Pearson product-moment correlation coefficient, here’s the link to it.
- data_y – is the range of values or a reference to the range of cells that consists of the dependent data.
- data_x – is the range of values or a reference to the range of cells that consists of the independent data.
Usage: correlation formula in Google Sheets
Having gone through the syntax, we now understand that we need two sets of data for the formula to return a value. Now, let us go ahead and try our hands practically on Google Sheets application itself. Consider the screenshot below where we have used the correlation formula.
The parameters accept the input values in two different ways. We can either choose to input the direct numeric datasets using curly braces that indicate ranges (first example). Or we can simply use range references (second example).
Now, let us see how the formula behaves if x and y data were in a linear relationship (y = 10x + 5). Please see the following image. Since this is a direct dependency, the formula returns 1 for output.
What if we tried an equation with a negative slope (y = -5x+10)? We should see -1 in our result because this is a negative dependency. And, the snapshot below confirms it.