If you’re in the business of pulling regular reports in Google Analytics for clients or senior management, you’re likely no stranger to advanced filtering. Sure, you can get your data relatively painlessly with your standard “Containing” or “Begins With” filters, but what if you could provide much more precise reporting with a few easy filtering techniques? That’s where regular expression (or “RegEx” for short) can really save the day. Today, we’re going to look at a few powerful RegEx filters within Google Analytics that can get you the data you want faster.
If you’re new to the concept entirely, first check out Google’s guide to regular expressions, which breaks down the function of each RegEx character. That’s ok, we’ll wait.
Ready to see some examples in action? Let’s get going.
1. RegEx for Finding a Date in a URL String
For many news-oriented sites, it’s common practice to include the publish date in the web address. That creates a convenient hook for RegEx filtering. For example, a news story for January 5th, 2015 may have the following URL:
Now let’s say we wanted to pull a report to only show articles that were published in January 2015. With our handy RegEx filtering, that might look something like this in Google Analytics:
Looks complex, but here’s what the RegEx is saying in layman’s terms:
Match everything that contains the text “-201501” at the beginning with some other stuff at the end. The “other stuff at the end” can be two digits that always begin with 0 followed by 1-9 OR (“|”) a two-digit number beginning with 1-3 and ending in 0-9.
Note the important distinction of including “0” before [1-9], as this particular website uses two digits for all days. Now let’s say the date appears in an alternate format, YYYYDM:
Here’s how that might look as a RegEx filter:
In this case, the RegEx is requesting something slightly different:
Match everything that contains the text “-2015” at the beginning and “1” at the end (“$”), with some other stuff in the middle. That “other stuff” can be the numbers 1-9 OR (“|”) two-digit numbers beginning with 1-3 and ending in 0-9.
With one filter, you’ve got all the content published for that month.
*A note to the keen eye: two separate dates can be formatted the same way here , e.g. 2014111 can be Jan 11, 2014 and Nov 1, 2014. This filter isn’t fool-proof, but it’s pretty darn close – I’d consider this an edge case. By my count, this can happen with five dates in any given year: 2014111, 2014112, 2014311, 2014112 and 2014212.
2. RegEx for Finding a Subfolder in a URL String
Now, let’s say I only wanted to see results for all articles on apples. Within this website’s URL structure, any article on this topic appears in the subfolder “/apples”. If I weren’t familiar with RegEx, I could simply use the filter “Begins With” and specify “/apples” and I’ll end up with the same end result.
But what if I also wanted to see results for subfolders “/apples”, “/grapes” OR “/plums”? If I applied three regular filters, Google Analytics would treat each as an “and” condition, which isn’t what I want. Instead, I can using the following RegEx filter to get my answer:
In plain English, this is telling Google Analytics: Show me pages with the subfolder “/apples”, “/grapes” OR “/plums”. This makes use of a few RegEx characters: the caret (“^”), which only shows results that begin with this string, and the pipe (“I”), which acts as an “or” match.
It’s worth noting: this assumes the profile in question isn’t displaying the hostname for these pages. That said, it can be very useful to view full page URLs for a profile, especially if you have multiple subdomains.
3. RegEx for Finding Misspellings or Keyword Variations
As much as we value knowing which keywords are driving visits to our site, our users’ search queries aren’t always perfect. While Google’s autocomplete algorithm will pick up on and rectify most typos, the misspelled keyword can still appear in Google Analytics verbatim.
Let’s say the keyword “apples” is a top traffic driver for our website, but often misspelled. Using a wildcard like the asterisk (*), you can account for all of these variations:
With this expression, “p” can occur zero or more times in the word. So you’ll see aples, apples, appples, ales (oops), and the correctly spelled apples. Neat, huh?
Similarly, the “?” wildcard will account for zero or one instances of the preceding character. This could be handy if we wanted to know if users were searching “fruit” or “fruits” to find our website (though the RegEx reads like an ill-formatted Jeopardy answer):
I hope this got you thinking of some smarter ways to filter your Google Analytics data. What are some of your favorite RegEx concepts? Share them in the comments!