top | item 13636884

The Google Analytics Setup I Use on Every Site I Build

493 points| uptown | 9 years ago |philipwalton.com | reply

99 comments

order
[+] jameslk|9 years ago|reply
It's pretty annoying that I have to create spam filters for Google Analytics to be useful. Every site I've installed GA on has required me to filter out spam. I don't understand why something isn't done about it at an engineering level. If site owners can set up filters against spammers, is it really that hard for Google to do it? Especially since they can see it across their accounts. Seems like it's the same type of issue that plagues email, yet Google seems to have that under control.
[+] throwawaydbfif|9 years ago|reply
You can get around this with some fairly simple hacks. Write some JavaScript that evals a part of your page or something crazy like loading part of itself from Rot13 text file. Have this js generate an ID you can identify as 'real' or 'fake'. Filter your analytics by this ID. If you want to be extra funny make real and fake IDs look indistinguishable to human eyes.

99.9% of spammers are too lazy to spend any time figuring this out for a single site, and their tools won't even tell them spam isn't working. I've gotten away with adding a simple static ID to everything and except for really large juicy targets spammers don't even watse time on this.

All of my sites get zero spam with this filter

[+] pedalpete|9 years ago|reply
Can you elaborate as to what sort of spam you are referring to? Do you mean bots viewing your pages? Or is it something else?
[+] kristianc|9 years ago|reply
Echoing what others are saying, I much prefer Google Tag Manager. Many clients use a CMS which make injecting dynamic variables into a page a bit of a pain if it's not done via rules at runtime.

The Next Web has open-sourced its Google Tag Manager setup (https://github.com/thenextweb/gtm), which has things like Scroll Tracking, Engagement Tracking (riveted.js), Outbound Link Tracking and lots of other things that are not in the default GA setup. They have recently added support for AMP.

In my experience it allows clients to get up and running with a useful GA setup in a couple of hours and means that you as a developer don't get bothered to make trivial changes.

[+] aorth|9 years ago|reply

  Scroll Tracking, Engagement Tracking (riveted.js), Outbound Link Tracking and lots of other things that are not in the default GA setup.
I understand why a site owner would want those things, but as a user it is terrifying! This is why I run an ad blocker.
[+] zimbatm|9 years ago|reply
Google Tag Manager is not very practical in cases where you want a secure website. It renders CSP useless because it requires all sorts of permissions like 'self' on scripts and CSS. It's good to have on the marketing webpage where no user content is displayed though.
[+] agentgt|9 years ago|reply
The only thing I have against tag manager is the documentation is sort of confusing. That is to manually setup GTM is pretty painful. Luckily most CMS have plugins.
[+] ndynan|9 years ago|reply
+1 - As a product person, being able to quickly spin up click tracking on a feature to measure its stickiness is awesome.
[+] caleblloyd|9 years ago|reply
With the surge in Ad Blocking recently, part of me wonders how accurate the Google Analytics JavaScript tracker is today, and how accurate it will be in 5 years. I wonder if we'll see a trend back to server-side analytics soon.
[+] Klathmon|9 years ago|reply
Honestly, I prefer the Google Analytics setup.

People who block my tracking scripts don't want to be tracked, so I won't track them.

I use that info to see how people use a product, how they interact with it and what I can do to improve it. Where my time and money will be best spent.

If people want to block them, that's fine, I'm not going to try and get around them, but their "voice" is also muted here. I'm no longer factoring in their usage patterns, their usage at all.

[+] pierrefar|9 years ago|reply
I run a service called Blockmetry [0] that measures exactly that, directly from pageviews. Some numbers get published regularly [1]. The percentage from August (last public number) is 5.2% of non-bot JS-enabled pageviews did not fire the analytics tag.

The short answer is that it's significant on an aggregate level worldwide, but the reality is that it varies _massively_ by country, device, day of the week [2], and even different sections on the same site. Additionally, there is small percentage of pageviews that have JS disabled you have to account for. This analysis was on HN earlier today [4] saying 0.2% of pageview worldwide have JS disabled, but, again, with huge variation (notably, Tor, but elsewhere too).

Q4 numbers are not released yet, but the trend is generally up, with some notable drops. Get in touch if you want more info or to set it up on your site [5].

[0] https://blockmetry.com/ [1] https://blockmetry.com/weather [2] https://blockmetry.com/blog/weekday [4] https://blockmetry.com/blog/javascript-disabled [5] https://blockmetry.com/contact

[+] fictioncircle|9 years ago|reply
Tbh, Google Analytics samples at scale and people who are blocking like that aren't affecting the results much as a result. Well, unless they have truly "unique" patterns of using the UI specific to that demographic.
[+] spuiszis|9 years ago|reply
I think JavaScript analytics is more or less here to stay. A broader move to server-side analytics depends on what you're going to use the data for. When I want clean(er) data for important metrics, like revenue/conversion rate for eCommerce sites, I implement a hybrid JS/back-end solution where I send important data to the GA API or Mixpanel via some back-end service [1]. I've found that revenue data in GA compared to the database on a number of sites I have consulted for can be off quite a bit, and sometimes +/- 25%, depending on how the JavaScript has been implemented.

With larger businesses, you'll probably see more server-side implementations as they have the budgets to ensure the data they're collecting is accurate. For a blogger or a small publisher without a dedicated tech team, there's nothing easier than dropping in a script tag and watching the data roll in.

[1]https://developers.google.com/analytics/devguides/collection...

[+] grey-area|9 years ago|reply
On technical sites I'm seeing traffic cut at least in half by GA against traffic measured server side (excluding bots), which isn't too surprising. On others by maybe a third. It is probably gradually becoming more and more inaccurate as users install blockers. I doubt many GA users are aware of this though. Anyone else seeing this?
[+] GrinningFool|9 years ago|reply
> see a trend back to server-side analytics soon.

We can only hope.

[+] coderdude|9 years ago|reply
The percentage of visitors with an ad blocker depends on your site's audience. Outside of computer geeks and gamers, almost no one uses ad blockers. I wouldn't buy into the hype that the whole world is installing ad blockers.
[+] Sir_Cmpwn|9 years ago|reply
Please don't contribute to Google's tracking dominance over the web. How insane is it that one company runs their javascript on 90% of the web?
[+] chishaku|9 years ago|reply
What are the best alternatives?
[+] jedberg|9 years ago|reply
It's a tough call, especially if your revenue model is ad based. They ad networks only trust 3rd party analytics.
[+] tombrossman|9 years ago|reply
Remember that it's mandatory to disclose to visitors that your site uses Google Analytics in their T&C's https://www.google.com/analytics/terms/us.html (section 7, 'Privacy'). I don't see a privacy policy on this Google employee's page but perhaps they have a special exemption?

Anyhow, for many websites you'll get more accurate traffic data with GoAccess parsing your logs and showing you page views and basic demographic data. Use it alongside Google Analytics if you must, to see the exact difference between what Google tells you your page views were versus what your server tells you.

[+] peterhartree|9 years ago|reply
> for many websites you'll get more accurate traffic data with GoAccess parsing your logs and showing you page views and basic demographic data

Yes but remember that bot traffic may be more of an issue when analysing server side logs (a lot of bots still don't execute JavaScript).

It's hard to know how effective the bot filtering features in GoAccess are compared with those of Google Analytics.

[+] largehotcoffee|9 years ago|reply
Not many people know about this feature of GA, but add the following to anonymize your users IP addresses before sending the information to Google.

> ga('set', 'anonymizeIp', true);

[+] pdkl95|9 years ago|reply
> anonymize your users IP addresses before sending the information to Google

That's a nice placebo that does almost nothing. Even if the packet body doesn't contain the IP address, it's still available in the IP header's Source Address field.

However, even if we assume Google - in a reversal of their general focus on gathering as much data as possible - doesn't recover the address from the IP header, their own documentation[1] for analytics collection URLs with the &aip=1 parameter (which should be present when 'anonymizeIp' is true) says:

    "... the last octet of the user IP address
     is set to zero ..."
Zeroing the least interesting 8 bits of the address doesn't make it anonymous. They still get to record the ASN, and they are recording at least 8 bit of fingerprintable data from other sources. I should be trivial to recover mostly-unique users, and calling this "anonymization" is at best naive and for Google, an obvious lie.

Their documentation even betrays their intentions:

    "This feature is designed to help site owners comply
     with their own privacy policies or, in some countries,
     recommendations from local data protection authorities,
     which may prevent the storage of full
     IP address information."
Actually making the data anonymous isn't the goal. They just want a rubber-stamp feature that lets them comply with the letter of the law.

[1] https://support.google.com/analytics/answer/2763052?hl=en

[+] cyborgx7|9 years ago|reply
Alternative title: The Spyware I Use on Every Site I Build
[+] thomasthomas|9 years ago|reply
Tag Manager is definitely preferable in my experience if you want to empower non technical people such as marketing to make their own changes on the fly without having to bother developers.
[+] sjeanpierre|9 years ago|reply
Yup, GTM is great until the folks in marketing add a script to the site without first testing in preprod that causes the UI for the app to not render.
[+] jon-wood|9 years ago|reply
You clearly have a more skilled marketing team than the one I tried to work with using GTM. I ended up dropping it because rather than implementing tracking Javascript in a text editor I was having to do it in an obtuse GUI instead - marketing wouldn't go near it.
[+] betolin|9 years ago|reply
It's not so safe, if not developers just copy paste code given by a third party they might inject insecure JS code
[+] niutech|9 years ago|reply
Don't feed Google with your visitors' data, respect their privacy, use open source Piwik instead.
[+] ns8sl|9 years ago|reply
What's the deal with stats delayed over 24 hours? Man, I hate that.
[+] shostack|9 years ago|reply
Beyond this info, I'd add my own suggestions from having spent a good portion of my career digging around in GA...

- If you have multiple domains, sub domains, etc. make sure to spend plenty of time reviewing the cross-domain setup documentation and test it thoroughly.

- If you have high volume, frequently do deep segmentations, use lots of custom dimensions, etc., make sure you have a clear understanding of how sampling in GA works, how to tell if you are being sampled, and find ways to avoid it by pulling reports in different ways. Otherwise you can end up in a situation where you are making decisions off of .3% of your traffic and while Google's sampling algorithm thinks it is fine, comparison against other data sources often shows it is not.

- Make sure any reporting you do across things like GA vs. AdWords is done with a clear understanding of how they each report on paid search. GA reports on it by default on a last non-direct click basis. AdWords just counts everything AdWords touches. This means that AdWords can give you a good sense of where you are gaining traction, whereas GA can help you understand how it works in conjunction with other touch points, and perhaps how you might change the way you weight things and measure success.

- GTM is powerful and free, but with great power comes great responsibility. Also, it can be a real PITA sometimes.

- Annotations are a highly underutilized tool in GA and can save you a lot of headaches. I just wish there was a way to bulk import/export them via spreadsheet or API.

- You can't currently create goal funnels from event-based conversions (please Google add this!), but the workaround for the time being is to push virtual page views at the same as the event fires, and then create funnels off of those.

- User stitching sounds awesome, but is actually much more limited than you'd think from reading overview. You need a separate view (which means your main GA view you use can't segment for the stitched sessions for comparison--just the new view which only contains the stitched users). And there's a 90 day rolling data retention window, so you need some sort of export process if you care about that data. Unfortunately, this is pretty important data if you have lots of cross-device tracking issues.

- Depending on your volume, you can reach the hit limits of the free tier pretty quickly if you start tracking a ton of events (since they all count as hits). Here's a good overview [1] of what these limits are, how they work, and what they mean for you. When I got the scary notification, Google was exceptionally unhelpful in working with me to resolve the problem, despite considerable ad spend. After reducing them to what we thought would be fine, they were unable to assure me that our data would not be nuked, and basically couldn't give me any real info beyond "this is the policy." Super frustrating.

- If you have good logging of events that tracks both server and client-side, it is healthy to compare for variances monthly or quarterly. You'd expect client-side tracking to break more often than server-side, but it is important to see how much that can alter your numbers.

[1] https://www.e-nor.com/blog/general/hit-count-in-google-analy...

[+] Roger_Jones|9 years ago|reply
Filtering out GA sessions with the language of "C" (versus actual languages like en-us, fr, etc.) goes a long way in filtering out GA spam.

This language code is 99% of the time associated with bots. I had one site where 20% of all the sessions in a given month was such fake traffic!

[+] shostack|9 years ago|reply
Isn't that a relatively easy thing for a spammer to change? Also, I'm seeing some valid traffic coming in with that language (conversions and everything).