As someone who has built several top 1000 trafficed websites over the past decade here is what the publishing industry definitely needs out of an analytics program.
1. Please give me a report that can prove that my user traffic is real.
2. Please give me a report that can prove that the traffic is healthy.
I know that I can get this from analytics now, but it needs to be the focus.
For a decade I've competed against content websites that for the most part game seo traffic, build click traps and generally pollute the Internet with secondary source content. I've always had fairly large audiences on my sites, with healthy 50% returning visitor rates. However, when it comes to getting ad dollars, I always lost to competitors who had much larger volume mostly because they were either buying meaningless inbound links or using some other scam like click trap "we recommend this hot girl talking about prostate cancer" photos to goose their numbers. Meanwhile we'd create quality content and my sites would have hundreds of comments, while theirs would have very little. It didn't matter that my audience was more engaged, advertisers bought volume.
I just need something that I can show to an advertiser (or even better, that they have access to and can compare) that says... hey, this website isn't a constructed fabrication made to fake volume and take your money you sucker. This is a real website.
A lot of the industry right now is based upon buying links from aging front door portals (Yahoo, MSN, AOL) which still do ungodly amounts of traffic with a mostly Internet illiterate audience. Sites buy these links, convert them into CPM click traps on their targeted magazine sites and sell their inventory to advertisers who don't know that the whole thing is shell game. They think they're buying ads on a hot new site with explosive growth.
I'm building an analytics interface for GA though and would love to chat about what else publishers need in an analytics interface - luke at itsninja if you'd like to chat.
Hey snide - I think we can probably help you at Snowplow Analytics. We warehouse all your atomic event data (including page views and in-page pings - v hard to fake) with IP address, browser fingerprint, 1st party cookie, optional 3rd party cookie, optional business defined-user ID, user timezone, browser features, useragent... If that sounds useful for proving your audience to advertisers, get in touch!
Do you have more information on how I can see a live view of the practices you discussed in your last paragraph? I was under the impression that traffic from the top portals was costly and not exactly suitable as a component in an arbitrage play like you mentioned.
I started saving all my page views in a postgresql database. Schema is pretty simple.
I have the following tables:
sessions
session_id (uuid type)
created_at
page_views
page_view_id
session_id
created_at
site_id
path
query_string (hstore)
user_agent
referral_url
ip_address
user_id
http_method (get, post, etc)
details (hstore, used to tag page views/actions)
This allows me to simply query all my page views against data in my live database. I can see the path a user took to place an order. I can easily integrate a/b tests. If someone uses a coupon on the site and we want to see if they later came back and viewed/purchased more, we can easily write a sql query to figure that out. We can simply figure out lifetime customer value, even if not logged in. If we're getting a large amount of traffic from a certain affiliate, we can alert our staff.
It's really awesome to be able to have your data in the same place. Having analytics data spread out to GA made it difficult to match that data against ours. If we need to scale out to multi-terabytes, postgres_fdw will make querying against the analytical database simple.
Since we're also tracking affiliate purchases to pay out commissions, I also have another table that that stores additional information about a page view if they came from an affiliate site (click id, the affiliate network, etc).
Yeah, we do that kind of stuff as well. At least you know what your data means. But when you start getting millions of hits a day, you won't necessarily want to spend some time scaling your system... In that case leaving it to the pros and focusing instead on your product may prove the most sensible move.
The last paragraph is important. I spent some time earlier this week when I learned about Universal Analytics -- but quickly discovered that UserID tracking hasn't shipped yet.
Can anyone on the GA team speculate about a release date for the uid bits?
userId bit has been there from the start in Universal Analytics. Its called custom dimensions and can be used to send any property about the user into the GA and then link it to a User or a specific Visit.
I was not aware that the new analytics would track users. One interpretation of section 7 of the Google Analytics Terms of Service is that tracking individuals is not allowed:
Tracking an individual is different than storing personally identifiable information. I can assign you an arbitrary (or seemingly arbitrary) userID (that is unique to you), but does not personally identify you, as a way to track you. This arbitrary userID is meaningless to any third parties. What I cannot assign you, is your name, email address, or even IP address as a way to track you since anyone that sees that information could figure out who it belongs to.
This article is really making a big deal out of nothing. All the "major issues" brought up here only create problems in edge cases. When you're trying to drive growth or understand your users (the purpose of metrics at the end of the day) you should not be focused on edge cases.
In most cases the reason you care about tracking logged-out -> logged-in behavior is to measure onboarding behavior, understanding what the user does pre-signup so you can do a better job of driving signups. Signup is not a multi-client process in the common case so being able to track multi-client behavior pre-signup doesn't really matter at all.
Agreed, these are edge cases. They did create a lot of questions for me though, and made the whole thing rather confusing as a user.
As to how much of an issue these edge cases represent, I find it hard to get a real sense of it. I guess it really depends on the situation, what you want to measure and the user experience you offer to your visitors.
My gripe about google universal analytics or analytics.js vs ga.js is
broken backwards compatibility (cookie data is no longer stored in the same way) this was an interface many add/systems used and depend on from the days of Urchin.
Otherwise, new interface is pretty slick, features look good, the API to send data server side is so much nicer.
> For one, there can’t be 2 [clientID, userID] couples with
the same userID: with the way mixpanel does things, this
is essentially a technically impossible scenario (...)
And yet one user can access your site through different
clients, leading to a systematic overestimation of the number
of visitors hitting your site.
Really? Anyone can confirm this behavior? I'm pretty sure KissMetrics doesn't have this limitation.
Indeed, and this is why we ended up choosing KM over MP. With KM you just "identify" a visitor whenever you want and if there's already another anonymous cookie, it'll tie together all events retroactively. We couldn't find an easy way to do this with MP when we looked at it.
Last I checked User based analytics is directly against the Google TOS. You are not supposed to store any identifying information about specific users, probably because Google has been under privacy scrutiny. So not only is google not for user based tracking they prohibit it, making them a real non-starter in any case.
Check out the Google I/O video I mention in the article if you need convincing. As far as not collecting user data fo privacy reasons, I think brandon0's comment says it all.
snide|12 years ago
1. Please give me a report that can prove that my user traffic is real.
2. Please give me a report that can prove that the traffic is healthy.
I know that I can get this from analytics now, but it needs to be the focus.
For a decade I've competed against content websites that for the most part game seo traffic, build click traps and generally pollute the Internet with secondary source content. I've always had fairly large audiences on my sites, with healthy 50% returning visitor rates. However, when it comes to getting ad dollars, I always lost to competitors who had much larger volume mostly because they were either buying meaningless inbound links or using some other scam like click trap "we recommend this hot girl talking about prostate cancer" photos to goose their numbers. Meanwhile we'd create quality content and my sites would have hundreds of comments, while theirs would have very little. It didn't matter that my audience was more engaged, advertisers bought volume.
I just need something that I can show to an advertiser (or even better, that they have access to and can compare) that says... hey, this website isn't a constructed fabrication made to fake volume and take your money you sucker. This is a real website.
A lot of the industry right now is based upon buying links from aging front door portals (Yahoo, MSN, AOL) which still do ungodly amounts of traffic with a mostly Internet illiterate audience. Sites buy these links, convert them into CPM click traps on their targeted magazine sites and sell their inventory to advertisers who don't know that the whole thing is shell game. They think they're buying ads on a hot new site with explosive growth.
lukestevens|12 years ago
I'm building an analytics interface for GA though and would love to chat about what else publishers need in an analytics interface - luke at itsninja if you'd like to chat.
alexatkeplar|12 years ago
omarchowdhury|12 years ago
joevandyk|12 years ago
I have the following tables:
This allows me to simply query all my page views against data in my live database. I can see the path a user took to place an order. I can easily integrate a/b tests. If someone uses a coupon on the site and we want to see if they later came back and viewed/purchased more, we can easily write a sql query to figure that out. We can simply figure out lifetime customer value, even if not logged in. If we're getting a large amount of traffic from a certain affiliate, we can alert our staff.It's really awesome to be able to have your data in the same place. Having analytics data spread out to GA made it difficult to match that data against ours. If we need to scale out to multi-terabytes, postgres_fdw will make querying against the analytical database simple.
Since we're also tracking affiliate purchases to pay out commissions, I also have another table that that stores additional information about a page view if they came from an affiliate site (click id, the affiliate network, etc).
Here's the plpgsql function I use for saving the sessions and page views: https://gist.github.com/joevandyk/f63523cdd1a3aa75d0ec
duwip|12 years ago
mikeknoop|12 years ago
Can anyone on the GA team speculate about a release date for the uid bits?
hu_me|12 years ago
https://developers.google.com/analytics/devguides/collection...
j_s|12 years ago
http://www.google.com/analytics/terms/us.html
http://productforums.google.com/forum/#!topic/analytics/tTaq...Brandon0|12 years ago
jamiequint|12 years ago
In most cases the reason you care about tracking logged-out -> logged-in behavior is to measure onboarding behavior, understanding what the user does pre-signup so you can do a better job of driving signups. Signup is not a multi-client process in the common case so being able to track multi-client behavior pre-signup doesn't really matter at all.
duwip|12 years ago
As to how much of an issue these edge cases represent, I find it hard to get a real sense of it. I guess it really depends on the situation, what you want to measure and the user experience you offer to your visitors.
taf2|12 years ago
broken backwards compatibility (cookie data is no longer stored in the same way) this was an interface many add/systems used and depend on from the days of Urchin.
Otherwise, new interface is pretty slick, features look good, the API to send data server side is so much nicer.
broken compatibility just kinda sucks though
jdangu|12 years ago
Really? Anyone can confirm this behavior? I'm pretty sure KissMetrics doesn't have this limitation.
losvedir|12 years ago
KaoruAoiShiho|12 years ago
duwip|12 years ago
unknown|12 years ago
[deleted]