Regular Expressions for creating calculated fields in Looker Studio

This post is a bit different from the others. This time, we won’t dive into Looker Studio itself, but rather review something that can greatly simplify our lives when creating fields within the tool: regular expressions.

They are so important that I think it’s essential to understand what they are and see examples before we get into creating custom dimensions.

1. What are Regular Expressions?

Regular expressions (regex) are powerful tools that allow us to find a combination of characters within a text using patterns. This might be hard to grasp if you’re hearing about them for the first time, so let’s look at an example.

Imagine you conducted a survey about your (hypothetical) restaurant, and one question allows for free-text responses. If 5,000 people answered, you may be particularly interested in those who mentioned your Instagram profile.

Respondents might refer to Instagram in various ways: capitalised, lowercase, with abbreviations, or including “.com.” You could create a filter, as discussed in the post about filters in Looker Studio, and add each condition one by one. However, to save time, you can use regular expressions.

Here are some of the basic regular expressions I frequently use:

2. Basic Regular Expressions

Vertical Bar |

  • Means “or.”
  • Example: instagram|instagram.com|Instagram|Instagram.com|IG|ig
  • Matches: instagram OR instagram.com OR Instagram OR Instagram.com OR IG OR ig When creating a new filter, you can select the condition “Matches regular expression” and include the above example instead of adding a line for every possible way Instagram can be written.

Dot and Asterisk .*

  • Matches any character or set of characters.
  • Example: instagram.*|Instagram.*|IG|ig
  • Matches: The same as the previous example. Instagram.* works for both “Instagram” and “Instagram.com” since .* will match anything that follows.

Dot .

  • Matches any single character.
  • Example: .nstagram.*|IG|ig
  • Matches: Similar to the previous example. Instead of writing the social media name with both uppercase and lowercase “i,” you can use a dot to represent any letter.

Caret ^

  • Matches texts that start with the letters following this character.
  • Example: ^Instagram
  • Matches: All free-text responses that begin with “Instagram.”

Dollar Sign $

  • Matches texts that end with the letters preceding this symbol.
  • Example: Instagram$
  • Matches: All responses that end with the word “Instagram.”

The goal in sharing these regular expressions isn’t for you to memorise them all, but to familiarise yourself with the ones I find most useful. So far, we’ve covered only five, but there are many more.

3. Intermediate Regular Expressions

Here are two slightly more advanced regular expressions that I won’t ask you to memorise but that will help you understand the expressions we’ll see in the next chapter.

Backslash followed by d \d

  • The backslash before a character prevents it from functioning as a regular expression. For example, to search for all comments that mention the dollar sign, you must use \$ instead of just $, as Looker Studio will interpret $ as a regex symbol.
  • The backslash followed by “d” (\d) matches a single digit: “0”, “1”, “2”, …, “9”. It can also be represented as [0-9].
  • To specify one or more digits, use \d+ or [0-9]+. To specify exactly two digits, use \d{2}. Note: If you forget the backslash and write d{2}, the regex will match the letter “d” appearing twice in a row (i.e., “dd”).

Backslash followed by w \w

  • Functions similarly to the previous case but matches alphanumeric characters and underscores.
  • Can also be expressed as [0-9a-zA-Z], but this doesn’t include underscores.
  • [a-z] matches a single lowercase character from “a” to “z.”
  • [A-Z] matches a single uppercase character from “A” to “Z.”
  • Adding + at the end specifies that it can match one or more characters.

There are countless regular expressions, and these are just a handful. I recommend trying to remember the examples shared here, as they will assist you in creating calculated fields in Looker Studio.

4. Formulas for Using Regular Expressions

When creating fields, Looker Studio offers a series of formulas that you will use with regular expressions:

REGEXP_MATCH(field, regex)

  • Returns TRUE if the field values exactly match the regex; otherwise, returns FALSE.
  • Example: REGEXP_MATCH(Open-ended question, 'Instagram')
    • Returns TRUE if the open-ended question is exactly “Instagram.” Since it’s unlikely the answer will contain only “Instagram,” you can use a regex to find responses with text before or after “Instagram”: REGEXP_MATCH(Open-ended question, '.*Instagram.*')
    Note: Text strings are always enclosed in single quotes.

REGEXP_CONTAINS(field, regex)

  • Returns TRUE if the field values contain the regex; otherwise, returns FALSE.
  • Example:REGEXP_CONTAINS(Open-ended question, 'Instagram')
    • Returns TRUE for all responses containing “Instagram.” It doesn’t have to match exactly.

REGEXP_EXTRACT(field, regex)

  • Returns the field values that exactly match the regex.
  • Example: To extract the first directory from a URL: REGEXP_EXTRACT(URL, '^https://[^/]+/([^/]+)/')
    • For the URL https://elpais.com/educacion/los-conocimientos-son-competencias.html, this would return “educacion.”

REGEXP_REPLACE(field, regex, replacement text)

  • Replaces field values that match the regex with the new replacement text.
  • Example: To correct the misspelling “Instagran” to “Instagram”:
    REGEXP_REPLACE(Open-ended question, 'Instagran', 'Instagram')

CASE WHEN

  • You can use regex with this formula, but it can also work without them. I often use it with the previous four formulas, but they don’t have to go together.
  • It’s a more advanced version of IF, allowing for multiple conditions. Structure:
   CASE WHEN field condition value THEN result
   WHEN field condition value THEN result
   ELSE result if none of the previous conditions are met
   END

Example:

   CASE WHEN product = 't-shirt' THEN 'Clothing'
   WHEN product = 'skirt' THEN 'Clothing'
   WHEN product = 'shoe' THEN 'Footwear'
   ELSE 'Other category'
   END

Regular expressions and formulas may seem complicated, but we’ll put them into practice in the next post by creating custom dimensions in Looker Studio, and you’ll see that they are much easier than they appear.

Leave a Comment