Friday, March 11, 2016

Tracking categorical variables with Google Analytics and Google Tag Manager

Recently I was working with a magazine’s website. And one of the questions was to construct the report about an authors’ popularity. Surprisingly, this one turned out a bit challenging.

I needed to sum up all visits to all of the articles’ pages for every particular author. Using these numbers, I would build a rating of authors. The problem is in fact that any author can have many articles, and any article can be written by several authors (coauthors). Speaking SQL, it’s a “Many-To-Many Relationship”.

First of all, ‘author’ is a categorical variable since it takes discrete values of a string type, like an author’s name concatenated with a surname. Therefore, we can’t use Custom Metrics of Google Analytics since it can only be a number, time or currency. No dictionary option here.

On the other hand, Google Analytics offers Custom dimensions for such cases. If you have implemented Google Tag Manager, you can just make the new custom dimension ‘author’, and pass the author’s name from each article’s page through the Data Layer. But not in our “Many-to-Many” case!

If an article was written by two or more authors, you can’t pass one long string with the authors’ names, like this: “A.Johns, B.Johnson, C.Jacobs”. This is not an option, because you need to make three (in this example) distinct database entries: one for each author.

So we have the categorical variable with multiple values for each pageview.

Here is my approach on how to handle the situation..