The Scatter Plot: How to analyse the relationship between two variables in Looker Studio

The scatter plot is ideal for analysing the relationship between two variables. In this post, we’ll explore how to set up this visualisation and what configuration options it offers.

Remember, visit the post on how to add charts to your report in Looker Studio, if you need a refresher.

1. Example

In this example, we’ll work with the Spotify dataset. We’ll analyse the relationship between the number of times a song has been played and how many Spotify playlists it has been added to. We’ll see if we can confirm that the more times a song is played, the more playlists it gets saved to.

To create the chart, you need to use the song name (track_name) as the dimension, streams as the X-axis metric, and the number of Spotify playlists (in_Spotify_playlists) as the Y-axis metric.

The result is a scatter plot showing the relationship between the number of times a song has been played and how often it has been added to a Spotify playlist.

As the chart suggests, there is indeed a relationship between the two metrics. Generally, the more plays a song has, the more it tends to appear in Spotify playlists. Statistically, this is called correlation, and we can say the variables are highly correlated.

Correlation and scatter plots are topics that can be explored in great detail, but to stay focused on the purpose of this post, I’ll leave a link here where you can learn more about it if you’re interested.

Let’s look at another example where the correlation is lower. We’ll change the Y-axis metric to in_Deezer_playlists.

Now, our scatter plot shows the relationship between how often a song has been played and how often it has been added to a Deezer playlist.

In this case, the trendline is much flatter than before, indicating a lower correlation.

Would you like to know how to add a trendline?

Let’s explore that in the next section.

2. Chart-Specific Customisation Options

Within Style, you can add a trendline, choose the type (linear, exponential, or logarithmic), and customise its format. You’ll find Trendline with a dropdown below it where you can select the type.

There are two types of scatter plots: one that Looker Studio calls a scatter chart, which represents data with points, and another called a bubble chart, which represents the data, as the name suggests, with bubbles.

In Style, you can change the colour of the bubbles. Under Number of bubbles, you’ll find Bubble colour. If you open the dropdown and select one of the dimensions you’ve used, the different values will be assigned a specific colour. This can be based on either the order of the bubbles or the value of the dimensions.

After making changes in Style, we now have a bubble chart where track_name dimension determines the bubble colour.

The scatter plot is definitely one of my favourite visualisations. What do you think? Do you like it, or do you find it generally hard to read?

Leave a Comment