top | item 10696873

We ditched Google Analytics

490 points| felipebueno | 10 years ago |spideroak.com

262 comments

order
[+] eponeponepon|10 years ago|reply

    It took us only a few weeks to write our home-brew 
    analytics package. Nothing super fancy yet now we have 
    an internal dashboard that shows the entire company much
    of what we used analytics for anyway - and with some 
    nice integration with some of our other systems too.
I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

And to echo other posters: SpiderOak deserve thanks. If I find myself with any need for a service like theirs, I know I'll be looking at them.

[+] gedrap|10 years ago|reply
>>> I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

Ah, the "not invented here" syndrome!

There are tons of things that you could do "in a couple of weeks" that more or less work. However, it doesn't mean you have to or even that it would be a good idea.

If all developers adopted the attitude that you have expressed, there would be thousands of sad sad developers who need to maintain shitty in-house analytics system because someone once said "I could do it in a week". There are tons of awful CMSes already because someone once said "I could do better than wordpress" / "I could create a better framework" / etc.

In a lot of the cases, GA is just good enough. Sure, you might need to spend some time to explore its features (custom dimensions, etc), there's more to GA than a number of pageviews for a given day. There are cases when GA is not enough. Fair enough. But it's definitely not the majority of the cases.

Sure, it makes sense for SpiderOak given it's target audience. However, there's no need to make such a generic statement about 'anyone working in the tech'.

[+] efuquen|10 years ago|reply
I don't think it's a matter of laziness. More so where is it best to spend your expensive/valuable developer resources, on the product or some home-baked analytic's framework?

I applaud SpiderOak, but they are much different from most other sites. They have privacy conscious customers to begin with, this is something that is good press for them and probably a net positive on their bottom line for doing it, not the case with most other sites. Also it's something they are doing after having a very mature product for many years, clearly not the first or most important thing they needed to tackle as a company.

[+] redthrowaway|10 years ago|reply
It's not laziness, it's opportunity cost. For SpiderOak, it makes sense to spend a few weeks of a few developers' time to roll their own analytics. For me, it doesn't. Our customers aren't privacy-focussed. In fact, our app depends on them explicitly sharing [quite a lot of very personal] data with us. I would rather spend that time building something that delivers value to them and us than indulging my personal beliefs about privacy.
[+] rplnt|10 years ago|reply
Aren't there self-hosted analytics anyway? Piwik[1] comes to mind first, but I'm sure there are many.

1. https://piwik.org/

[+] untog|10 years ago|reply
And I never quite grasp why many people working in the tech sector are insistent on reinventing things that already exist. Such thinking thrives on developers' personal sense of exceptionalism in my opinion.
[+] ascendantlogic|10 years ago|reply
That starts going down the path of the "not invented here" mindset. You could then attribute not hand-rolling every bit of infrastructure yourself as "laziness". Yes, I am lazy to the point that I don't want to hand-roll an industrial-strength RDBMS myself, or the operating system, or the networking protocol, or the key/value store, etc etc.
[+] onion2k|10 years ago|reply
If all you want to know is who accessed a site, with which browser, how long for, and which pages they looked at then you could get all that from your webserver's log files without writing any code. On the other hand, to build something that's robust, relatively scalable, works across browsers and devices, and can give you an event watching platform like GAnalytics gives you (eg the useful bit), that is far from trivial.
[+] raverbashing|10 years ago|reply
Most developers don't develop (major) libraries, languages and OSs in house, it doesn't mean they are lazy, it means the company need to focus limited resources on their core business.
[+] bluedino|10 years ago|reply
>> Google Analytics thrives on developers' laziness in my opinion.

Every service does. Pingdom, GA, Olark, Github...

It took them a few weeks to write their own analytics. What features did they not implement? How many people worked on it?

Does your 1 or 2 person startup have 4 weeks to write their own analytics package or do you have more important stuff to do? (I'm betting you do. Like launching your product instead of re-inventing the wheel with analytics)

[+] beerbajay|10 years ago|reply
> Google Analytics thrives on developers' laziness in my opinion.

It's almost never "developers" who are deciding to use GA; it's middle managers or marketing departments.

[+] digi_owl|10 years ago|reply
> thrives on developers' laziness in my opinion.

Frankly most of what i read out of the tech world these days seems to be about pandering to developer laziness.

All manner of APIs and services seem to exist in their current form simply to extract rent from developers that don't want to do back end "dirty work".

[+] chimeracoder|10 years ago|reply
> I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

Unless I'm mistaken, one big difference is that not using Google Analytics means you don't know which Google search pages people used to access your website. That can be a really important difference for some websites.

[+] morgante|10 years ago|reply
Having implemented two different custom analytics dashboards, it's a lot more complex than you think.

Sure, the basics are easy. But marketers and business people want to drill into a lot of data which is non-trivial to gather.

Unless you have a compelling business case (which SpiderOak does), it's not worth it.

[+] yoo1I|10 years ago|reply
A lot of people are replying to the suggestion of implementing your own analytics by calling out it's NIHness.

I've recently been faced with this problem, and a solution doesn't have to be too complex.

There are roughly two parts to an analytics solution: event logging and, well, the actual analytics.

Writing your own logger in javascript is super simple, you're just sending off json objects to be inserted into a elasticsearch cluster. Since you have to define that logging anyhow, the only extra work you need to do is the layer to do the actual ajaxrequests.

What's left is running and defining your queries in elasticsearch.

BAM! Analytics

I realize it's not fit to be used for every situation, but it can so some pretty complex things this way without the hugest amount of effort ...

[+] tootie|10 years ago|reply

    much of what we used analytics for anyway
Until your requirements grow and your stuck building something that was in GA 5 years ago.
[+] ubersync|10 years ago|reply
Don't the ad blockers disable Google Analytics by default? If I am not wrong, I think uBlock Origin does.

So, I think, as more and more people will start using ad blockers, site owners will start getting less and less accurate stats from Google Analytics, forcing them to implement their own solutions. Hopefully, open source solutions will start providing the best features that Google does.

[+] jkestner|10 years ago|reply
And GA is inscrutable. I don't use it very much because it's got way too many layers of abstraction. It was fine before as Urchin. Maybe this is a category like email clients — there should be a sustainable paid product that doesn't suck.
[+] Retra|10 years ago|reply
Everything developers don't do is a matter of laziness if you ignore the fact that they might have other priorities.
[+] splatcollision|10 years ago|reply
Looking for a good npm / express middleware module that does this. Combines privacy concerns + developer laziness!
[+] Veratyr|10 years ago|reply
Not strictly on topic so I apologise if this is unwanted but I thought I'd share my experience with SpiderOak in case anyone here was thinking of purchasing one of their plans.

In February SpiderOak dropped its pricing to $12/month for 1TB of data. Having several hundred gigabytes of photos to backup I took advantage and bought a year long subscription ($129). I had access to a symmetric gigabit fibre connection so I connected, set up the SpiderOak client and started uploading.

However I noticed something odd. According to my Mac's activity monitor, SpiderOak was only uploading in short bursts [0] of ~2MB/s. I did some test uploads to other services (Google Drive, Amazon) to verify that things were fine with my connection (they were) and then contacted support (Feb 10).

What followed was nearly __6 months__ of "support", first claiming that it might be a server side issue and moving me "to a new host" (Feb 17) then when that didn't resolve my issue, they ignored me for a couple of months then handed me over to an engineer (Apr 28) who told me:

"we may have your uploads running at the maximum speed we can offer you at the moment. Additional changes to storage network configuration will not improve the situation much. There is an overhead limitation when the client encrypts, deduplicates, and compresses the files you are uploading"

At this point I ran a basic test (cat /dev/urandom | gzip -c | openssl enc -aes-256-cbc -pass pass:spideroak | pv | shasum -a 256 > /dev/zero) that showed my laptop was easily capable of hashing and encrypting the data much faster than SpiderOak was handling it (Apr 30) after which I was simply ignored for a full month until I opened another ticket asking for a refund (Jul 9).

I really love the idea of secure, private storage but SpiderOak's client is barely functional and their customer support is rather bad.

[0]: http://i.imgur.com/XEvhIop.png

[+] Someone1234|10 years ago|reply
Many of these types of services seem to intentionally cap upload speeds to reduce their potential storage liability (since they're likely over-selling storage to be able to offer 1 TB for $12 with the level of redundancy, staffing costs, etc, needed).

I wonder if that is happening in this specific case? Although if it were the case the vendor should still be honest about it. Just saying they limit uploads to 2 Mbps is better than giving the run-around.

[+] mark_l_watson|10 years ago|reply
That doesn't sound good. On the other hand, I use SpiderOak with not a lot of cloud storage use, with clients on OS X, Linux, and until this morning Windows 10. The only problem I ever had was more or less my fault - trying to register a new laptop with a previously named setup.

BTW, why store photos and videos on encrypted storage? For that I use Office 365's OneDrive: everyone in my family gets a terabyte for $99/year and I really like the web versions of Office 365 because when I am on Linux and someone sends me an EXCEL or WORD file, no problem, and I don't use up local disk space (with SSD drives, something to consider).

[+] ldehaan|10 years ago|reply
This has been my experience as well, not to mention how much the client slowed down my machine. It's been really slow going but the client is getting better. I never tried doing the encryption on my side, though, they also do diffs on each file you upload so I imagine that has something to do with the lag. I still use spideroak, they're the only company I'm aware of that encrypts locally and also has done a lot to progress personal security for all of us. So I've gotten used to the slow speeds and buggy software, it keeps getting better so that's a big plus :)
[+] theandrewbailey|10 years ago|reply
I was going to post a comment about how cloud storage is more of a means to move data around rather than back it up, until I dug a little deeper and saw that SpiderOak actually pitches itself primarily as a backup provider. I agree, it needs to be much faster than that.
[+] kbenson|10 years ago|reply
Is it possible that they are working on batches, and not doing any hashing/compression in parallel with the uploading? It seems feasible from your screenshot that they are getting ~10GB of data at a time, compressing(?) and hashing, and then uploading, and then starting on the next ~10GB.
[+] draw_down|10 years ago|reply
This comment is ridiculous, and so is the fact that it's at the top. This is supposed to be about Google Analytics, come on.
[+] Paul-ish|10 years ago|reply
Could the issue be caused by bad peering between your ISPs?
[+] bluedino|10 years ago|reply
>> my laptop was easily capable of hashing and encrypting the data much faster than the network was capable of handling it

You are assuming that you are the only one using that uplink and that server

[+] buro9|10 years ago|reply
Why not move to push GA data server-side?

Trivial to set-up, immune to adblockers affecting the completeness of data, prevents the write of tracking cookies, leaves data and utility of the GA dashboard mostly complete (loss of user client capabilities and some session-based metrics).

This is the route I'm preferring to take (being applied this Christmas via https://pypi.python.org/pypi/pyga ).

One may argue that Google will still be aware of page views, but the argument presented in the article is constructed around the use of the tracking cookie and that would no longer apply.

I'm shifting to server-push to restore completeness, I'm presently estimating that client-side GA represents barely 25% of my page views (according to a quick analysis of server logs for a 24hr period). I'm looking to get the insight of how my site is used rather than capabilities of the client, so this works for what I want.

[+] oneJob|10 years ago|reply
How about open-sourcing your product before worrying about improving other products? SpiderOak has been "investigating a number of licensing options, and do expect to make the SpiderOak client code open source in the not-distant future" for a very, very long time now. It's no trivial thing to have a closed source client for a "zero knowledge" service.

https://spideroak.com/faq/why-isnt-spideroak-open-source-yet...

EDIT: I'd welcome discussion, in addition to your up/down votes

[+] prajjwal|10 years ago|reply
I came here for this exact thing. They said they were going to go open source in 2014 IIRC, and failed to deliver. I have stopped using SpiderOak - how am I supposed to trust them with my most private files when I can't verify that they're not doing anything shady on my machine?

The opening line of this post is amusing. They ought to give thought to fixing their core product first.

[+] cm2187|10 years ago|reply
The other thing is that google analytics is on many adblockers lists, precisely for that reason. As adblockers are getting widespread, the analytics is going blind.
[+] nateberkopec|10 years ago|reply
An open-source, self-hostable solution providing 80% of common Google Analytics functionality seems doable to me.

Is there anything out there in this realm? If not, why not?

[+] trebor|10 years ago|reply
To any of the SpiderOak team: thank you.

It's more than just the tracking cookie, though. It's also about Google aggregating all its website data into a unified profile. The data they have on everyone is frightening—all because of free services like GA.

[+] c0achmcguirk|10 years ago|reply
Spideroak user here. I stopped using Dropbox and started using Spideroak about a 18 months ago. I really like the product. It's not as good as Dropbox in some ways (like automatically syncing photos from my phone) but it really is easy to use. I still have a mobile client on Android and I can keep my files in sync across multiple computers. I pay for the larger storage size and I'm not even close to using it all.

It syncs fast too. Just thought I'd share my experience with people.

[+] eljimmy|10 years ago|reply
Is it just me or is this a click-bait title with hollow content?
[+] lukeqsee|10 years ago|reply
> Like lots of other companies with high traffic websites, we are a technology company; one with a deep team of software developer expertise. It took us only a few weeks to write our home-brew analytics package.

I'm a little curious why they decided to go this route instead of using one of the open-source solutions. Aren't there good solutions to this problem already?

[+] rogeryu|10 years ago|reply
I'm doing my part. I'm moving to DuckDuckGo for searching more and more. It's a process. Google does have better results. For work I still rely on Google, for private stuff I use https://duckduckgo.com/

And for the sake of ducks, I'm eating less meat as well. No more chicken - too much antibiotics, and as little meat as possible, only when it's worth it, so great taste and good quality.

[+] kordless|10 years ago|reply
> Sadly, we didn’t like the answer to that question. “Yes, by using Google Analytics, we are furthering the erosion of privacy on the web.”

The only thing "wrong" with using an analytics service to better understand your customers is that it places all knowledge of visits, including ones that wished to be private, in a centralized location. This can be useful in providing correlation data across all visitors in aggregate, such as which browser you should make sure your site supports most of the time.

In other words, there exists some data in aggregate that is valuable to all of us, but the cost is a loss of privacy for smaller sets of personal data.

If individuals don't want certain behaviors analyzed by others, then they shouldn't use centralized services which exist outside their realm of control. These individuals would be better off using a "website" that is hosted by themselves, inside their own four walls, running on their own equipment. A simple way for SpiderOak to address this is to put their website on IPFS or something similar.

I appreciate the fact that SpiderOak is thinking about these things. It's important!

[+] cpncrunch|10 years ago|reply
>why does Google and their advertisers need to know about it I would ask

Google is pretty clear about this. The only reason they track you is for advertising, and there isn't any evidence of them using the info for anything else. In fact there is a lot of evidence pointing the other way, such as their insistence on encryption data flowing between their datacenters.

This is Google we are talking about, not Kazakhstan, China or Russia.

[+] _lce0|10 years ago|reply
Kudos for this!!

it's interesting that still there's meta, probably leftover

    <meta name="google-site-verification" content="pPH9-SNGQ9Ne6q-h4StA3twBSknzvtP9kfEB88Qwl0w">
EDIT: wow, thanks for your answers guys!! so nice to see Cunningham's law in action ;)
[+] rbinv|10 years ago|reply
> It took us only a few weeks to write our home-brew analytics package.

Unfortunately, there's no way to replicate what Google Analytics currently offers (for free!) within a couple of weeks (or even months). Not with big data sets. Yes, GA does enforce sampling if you don't pay for GA Premium, but the free edition is still one of hell of a deal (if you don't care about privacy).

If you only use Google Analytics as a hit counter, sure, you can do that yourself within a couple of minutes. The advanced features are way more complicated, though (think segmentation and custom reports).

This also begs the question: why not use Piwik?

[+] ksec|10 years ago|reply
To me, it is the cost that matters. Most other Analytics cost $30 - $50 / 1 Million Pageview / Datapoint. To me this expensive. Even when you scale to 100M it will still cost ~$20/Million.

Piwik doesn't scale. At least it doesn't scale unless you spend lots of resources to tinker with it. Its Cloud Edition is even more expensive then GoSquared which i consider to be a much better product.

What we basically need is a simple, effective, and cheap enough alternative to GA. And so far there are simply none.

[+] api|10 years ago|reply
Instead of rolling your own look at Piwik. It works very well and is basically a GA clone. I actually like it better than GA in some ways. It's easy to set up and you can run it on your own site so you're not contributing to a global tracking fabric.
[+] sghiassy|10 years ago|reply
I don't get it. SpiderOak states that they dropped GA because it furthers "the erosion of privacy on the web.”, but then they just started tracking in house.

How is tracking in house more private than GA? The user is still being tracked.

[+] kevin_thibedeau|10 years ago|reply
I haven't checked my GA in months since it became clear that Google won't bother doing anything to fix the referer spam problem that makes the stats useless if you don't have a high-volume site. It's not like these abusers are hard to track down but I'll be damned if I'm going to manually add filters to get rid of them every time they come in from a new domain.