Wednesday, August 10, 2016

Products list optimization project, intro

The next series of blog posts is dedicated to the UX research project I’ve accomplished recently. The described approach could be useful for e-commerce websites and online stores, especially those with large product catalog.

Motivation and Main idea.
Products in the catalog subcategories get unequal shares of user attention. Those in the top of the list are seen by almost all users. Bottom of the list gets multiple times less attention (twice as a minimum). Most users don’t scroll all the list throughout.
But as statistics tells us, often the products from the top of the list don’t attract users’ interest. They don’t click on these products, they don’t put them to the shopping cart. At the same time, bottom list products get much more interest.
So the main idea is to analyze ratio of each product’s visibility (attention) and users interest, and put most popular items to the top of the list in order to sell more.

Summary of this series.
First, we make some tunings in our Google Analytics data collection process. We should track the page scroll to measure each product’s visibility. Also we need to gather information about clicks on the links to the product’s page and ‘Buy’ buttons for each product.
Second, we connect to the Google Analytics API from statistical environment (Rstudio in our case), retrieve necessary information, make some exploratory data analysis and get the final report. This report gives us directions as to what permutations should be done in subcategories of our product catalog.

So next time, we will start off with GA tuning.

Monday, June 6, 2016

Learning Python for Data Analysis and Visualization course finished!

It finally happened! Over 100 lectures course Learning Python for Data Analysis and Visualization on Udemy. Very good introduction to Data Science with Python language. Lots of useful outbound links for the further exploration. And lots of stuff to think about, like in which tasks one should choose Python and where to choose R.

Sunday, April 10, 2016

Scroll depth tracking

Understanding the way users scroll pages on your site is important to measure the engagement. Being ensembled with metrics like “time on page”, it will get you valuable information about people’s interest in your content.

I know, I know.. If you google something like “scroll depth tracking”, you’ll get ready-for-use Google Analytics plugin, which is apparently quite good. But what if you don’t want to involve jQuery or you wish to tune every subtle detail, or you have another reason to reject ready-made solution? Also I don’t really like the idea of sending some scroll stats (e.g. “50% achieved”) to GA before user finished working with the page.

Here is my approach.

Friday, March 11, 2016

Tracking categorical variables with Google Analytics and Google Tag Manager

Recently I was working with a magazine’s website. And one of the questions was to construct the report about an authors’ popularity. Surprisingly, this one turned out a bit challenging.

I needed to sum up all visits to all of the articles’ pages for every particular author. Using these numbers, I would build a rating of authors. The problem is in fact that any author can have many articles, and any article can be written by several authors (coauthors). Speaking SQL, it’s a “Many-To-Many Relationship”.

First of all, ‘author’ is a categorical variable since it takes discrete values of a string type, like an author’s name concatenated with a surname. Therefore, we can’t use Custom Metrics of Google Analytics since it can only be a number, time or currency. No dictionary option here.

On the other hand, Google Analytics offers Custom dimensions for such cases. If you have implemented Google Tag Manager, you can just make the new custom dimension ‘author’, and pass the author’s name from each article’s page through the Data Layer. But not in our “Many-to-Many” case!

If an article was written by two or more authors, you can’t pass one long string with the authors’ names, like this: “A.Johns, B.Johnson, C.Jacobs”. This is not an option, because you need to make three (in this example) distinct database entries: one for each author.

So we have the categorical variable with multiple values for each pageview.

Here is my approach on how to handle the situation..

Friday, July 10, 2015

Google Tag Manager certificate

One more certificate of mastering one more very useful tool from Google. This time it is Google Tag Manager. I've already used it in my work. But now my knowledge about it is more consistent.
Looking forward to work with GTM closer while implementing and customizing analytics on websites and mobile.

Monday, July 6, 2015

Kaggle Walmart competition

Just submitted my code on GitHub, which gave me the first Kaggle badge (top 25%).
https://github.com/Oleg-Davydov/kaggle_walmart_competition
I used Gradient Boosting with R package caret. But before that, there was a lot of preprocessing.
This was my second Kaggle competition. I finished 86th out of 485 participants, 1% of score difference with the winner.
Probably not bad for the start :)

Sunday, July 5, 2015

Google Analytics certified

I've worked with Google Analytics almost since its birth in 2005. And didn't pay too much attention to certification. But now it's done! I just passed GAIQ exam and got 95% mark. So now I'm Google Analytics certified:
https://www.google.com/partners/#i_profile;idtf=112782820638777969559;

Thanks to Google for that brilliant tool!