Tuesday, August 23, 2016

Product list optimization project, part 1.2. GA tuning: Pageview ID

One important addition to custom dimensions mentioned earlier, which I come up with, is Pageview ID. In webanalytics universe, there is an entity hierarchy which looks like this:
Client => Session => Pageview =>Hit (Event).
Due to already assigned Client ID and Session ID, we can distinguish one entity of that level from another.

But what about Pageview?
On the one hand, we have page URL. On the other hand, user’s trajectory over the website may be as complex as an analyst couldn't even imagine, including simultaneously opening lots of pages in browser tabs, switching back and forth between different windows (which affects visibility status), multiple hits on Back and Forward buttons, etc.
That’s why we need Pageview ID to which we can assign all these events and gather them later in reports under one Pageview entity.

I track Pageview ID through these steps:
  1. Make Custom Dimension “Pageview ID” in Google Analytics and remember its index.
  2. Make Custom (User-Defined) Data Layer variable “Pageview ID” and assign it to gtm.start. This is an automatically generated system variable which appears with each Pageview Tag (i.e. when pageview starts) and embeds itself in each Data Layer sending from this page (i.e. escort every event on the page). It’s value is a system time of GTM block start in the number of milliseconds format, like “1470900439977”
  3. (See the picture) Under every Pageview and Event Tag, connect the Custom Dimension (recall its index) and the value of Custom Variable “Pageview ID” (like you probably did for Hit timestamp previously).
So that must be it. Now each event falls into the right Pageview set (entity), with other events happened during the same pageview.

Tuesday, August 16, 2016

Product list optimization project, part 1.1. Google Analytics tuning: Custom dimensions

First of all, there is a common approach to manage Google Analytics implementation - through Google Tag Manager. This method has many advantages, but discussion of those is beyond the scope of this series of posts. From now on I assume this approach by default.

There is a great article of Simo Ahava “Improve Data Collection With Four Custom Dimensions”. It’s about 4 parameters which aren’t among dimensions and metrics of Google Analytics API by default, but are crucial for many web-analytics tasks. I didn’t use User ID in this project, but Client ID, Session ID and Hit timestamp were very helpful.

For those of you who decided to implement these custom dimensions too, I want to warn you about the subtle mistake in the Hit timestamp setting. I’ve written detailed comment about this under Simo’s article.

So in brief, when you configure Custom JavaScript Variable, you can’t treat milliseconds as other parts of the time (Hour, Minute, Seconds), because it’s a three-digit variable. Otherwise, “0.089” becomes “0.89” and exceeds “0.123”, which leads to awkward results in data when some page events precede their predecessors.

To fix this, you should add another function and apply it to milliseconds:

var pad00 = function(num) {
var norm = Math.abs(Math.floor(num));
return (norm < 10 ? '00' : (norm < 100 ? '0' : '')) + norm;

Wednesday, August 10, 2016

Products list optimization project, intro

The next series of blog posts is dedicated to the UX research project I’ve accomplished recently. The described approach could be useful for e-commerce websites and online stores, especially those with large product catalog.

Motivation and Main idea.
Products in the catalog subcategories get unequal shares of user attention. Those in the top of the list are seen by almost all users. Bottom of the list gets multiple times less attention (twice as a minimum). Most users don’t scroll all the list throughout.
But as statistics tells us, often the products from the top of the list don’t attract users’ interest. They don’t click on these products, they don’t put them to the shopping cart. At the same time, bottom list products get much more interest.
So the main idea is to analyze ratio of each product’s visibility (attention) and users interest, and put most popular items to the top of the list in order to sell more.

Summary of this series.
First, we make some tunings in our Google Analytics data collection process. We should track the page scroll to measure each product’s visibility. Also we need to gather information about clicks on the links to the product’s page and ‘Buy’ buttons for each product.
Second, we connect to the Google Analytics API from statistical environment (Rstudio in our case), retrieve necessary information, make some exploratory data analysis and get the final report. This report gives us directions as to what permutations should be done in subcategories of our product catalog.

So next time, we will start off with GA tuning.