Thursday, September 1, 2016

Product list optimization project, part 1.3. GA tuning: Pagetype dimension

This post is a little diversion from the main course of this project.

Earlier I’ve been working on another UX research project which goal was to discover behavioral patterns in users’ sessions, in order to improve information architecture of the website. My approach was to apply hierarchical clustering to a number of sessions. But it’s hard to find any patterns when you parametrize session via URLs of each page in this session. Simply, it’s too much of diversity.

So I came up with idea to make a new dimension, Pagetype. Roughly speaking, it’s a place (coordinate) of this page in the website’s information architecture. It looks like this:
“<Senior section of the website> <level X> <particular type of the page>”
For example, “catalog level 4 product list”, “catalog level 5 product page”, “news level 2”, etc.

Performing some magic on distance measure and using Ward's minimum variance method while clustering, it gives very interesting outcomes for the information architecture analysis.

But let’s get back to our project!
I’ve mentioned Pagetype dimension because it helped here in this product list project too. Website I’ve been working on has product lists on different levels of catalog, from 2 to 5. So I used Pagetype custom dimension to filter product list pages from pages of other types.

This website happened to have a little bit weird CMS, and only javascript after page is loaded can decide whether it is a product list or not. So I’ve made a dedicated “Window loaded” tag in Google Tag Manager to track the Pagetype custom dimension.

I recommend to have such custom dimension if you work with behavioral patterns.

Tuesday, August 23, 2016

Product list optimization project, part 1.2. GA tuning: Pageview ID

One important addition to custom dimensions mentioned earlier, which I come up with, is Pageview ID. In webanalytics universe, there is an entity hierarchy which looks like this:
Client => Session => Pageview =>Hit (Event).
Due to already assigned Client ID and Session ID, we can distinguish one entity of that level from another.

But what about Pageview?
On the one hand, we have page URL. On the other hand, user’s trajectory over the website may be as complex as an analyst couldn't even imagine, including simultaneously opening lots of pages in browser tabs, switching back and forth between different windows (which affects visibility status), multiple hits on Back and Forward buttons, etc.
That’s why we need Pageview ID to which we can assign all these events and gather them later in reports under one Pageview entity.

I track Pageview ID through these steps:
  1. Make Custom Dimension “Pageview ID” in Google Analytics and remember its index.
  2. Make Custom (User-Defined) Data Layer variable “Pageview ID” and assign it to gtm.start. This is an automatically generated system variable which appears with each Pageview Tag (i.e. when pageview starts) and embeds itself in each Data Layer sending from this page (i.e. escort every event on the page). It’s value is a system time of GTM block start in the number of milliseconds format, like “1470900439977”
  3. (See the picture) Under every Pageview and Event Tag, connect the Custom Dimension (recall its index) and the value of Custom Variable “Pageview ID” (like you probably did for Hit timestamp previously).
So that must be it. Now each event falls into the right Pageview set (entity), with other events happened during the same pageview.

Tuesday, August 16, 2016

Product list optimization project, part 1.1. Google Analytics tuning: Custom dimensions

First of all, there is a common approach to manage Google Analytics implementation - through Google Tag Manager. This method has many advantages, but discussion of those is beyond the scope of this series of posts. From now on I assume this approach by default.

There is a great article of Simo Ahava “Improve Data Collection With Four Custom Dimensions”. It’s about 4 parameters which aren’t among dimensions and metrics of Google Analytics API by default, but are crucial for many web-analytics tasks. I didn’t use User ID in this project, but Client ID, Session ID and Hit timestamp were very helpful.

For those of you who decided to implement these custom dimensions too, I want to warn you about the subtle mistake in the Hit timestamp setting. I’ve written detailed comment about this under Simo’s article.

So in brief, when you configure Custom JavaScript Variable, you can’t treat milliseconds as other parts of the time (Hour, Minute, Seconds), because it’s a three-digit variable. Otherwise, “0.089” becomes “0.89” and exceeds “0.123”, which leads to awkward results in data when some page events precede their predecessors.

To fix this, you should add another function and apply it to milliseconds:

var pad00 = function(num) {
var norm = Math.abs(Math.floor(num));
return (norm < 10 ? '00' : (norm < 100 ? '0' : '')) + norm;

Wednesday, August 10, 2016

Products list optimization project, intro

The next series of blog posts is dedicated to the UX research project I’ve accomplished recently. The described approach could be useful for e-commerce websites and online stores, especially those with large product catalog.

Motivation and Main idea.
Products in the catalog subcategories get unequal shares of user attention. Those in the top of the list are seen by almost all users. Bottom of the list gets multiple times less attention (twice as a minimum). Most users don’t scroll all the list throughout.
But as statistics tells us, often the products from the top of the list don’t attract users’ interest. They don’t click on these products, they don’t put them to the shopping cart. At the same time, bottom list products get much more interest.
So the main idea is to analyze ratio of each product’s visibility (attention) and users interest, and put most popular items to the top of the list in order to sell more.

Summary of this series.
First, we make some tunings in our Google Analytics data collection process. We should track the page scroll to measure each product’s visibility. Also we need to gather information about clicks on the links to the product’s page and ‘Buy’ buttons for each product.
Second, we connect to the Google Analytics API from statistical environment (Rstudio in our case), retrieve necessary information, make some exploratory data analysis and get the final report. This report gives us directions as to what permutations should be done in subcategories of our product catalog.

So next time, we will start off with GA tuning.

Monday, June 6, 2016

Learning Python for Data Analysis and Visualization course finished!

It finally happened! Over 100 lectures course Learning Python for Data Analysis and Visualization on Udemy. Very good introduction to Data Science with Python language. Lots of useful outbound links for the further exploration. And lots of stuff to think about, like in which tasks one should choose Python and where to choose R.

Sunday, April 10, 2016

Scroll depth tracking

Understanding the way users scroll pages on your site is important to measure the engagement. Being ensembled with metrics like “time on page”, it will get you valuable information about people’s interest in your content.

I know, I know.. If you google something like “scroll depth tracking”, you’ll get ready-for-use Google Analytics plugin, which is apparently quite good. But what if you don’t want to involve jQuery or you wish to tune every subtle detail, or you have another reason to reject ready-made solution? Also I don’t really like the idea of sending some scroll stats (e.g. “50% achieved”) to GA before user finished working with the page.

Here is my approach.

Friday, March 11, 2016

Tracking categorical variables with Google Analytics and Google Tag Manager

Recently I was working with a magazine’s website. And one of the questions was to construct the report about an authors’ popularity. Surprisingly, this one turned out a bit challenging.

I needed to sum up all visits to all of the articles’ pages for every particular author. Using these numbers, I would build a rating of authors. The problem is in fact that any author can have many articles, and any article can be written by several authors (coauthors). Speaking SQL, it’s a “Many-To-Many Relationship”.

First of all, ‘author’ is a categorical variable since it takes discrete values of a string type, like an author’s name concatenated with a surname. Therefore, we can’t use Custom Metrics of Google Analytics since it can only be a number, time or currency. No dictionary option here.

On the other hand, Google Analytics offers Custom dimensions for such cases. If you have implemented Google Tag Manager, you can just make the new custom dimension ‘author’, and pass the author’s name from each article’s page through the Data Layer. But not in our “Many-to-Many” case!

If an article was written by two or more authors, you can’t pass one long string with the authors’ names, like this: “A.Johns, B.Johnson, C.Jacobs”. This is not an option, because you need to make three (in this example) distinct database entries: one for each author.

So we have the categorical variable with multiple values for each pageview.

Here is my approach on how to handle the situation..