Lightweight Alternatives to Google Analytics

[+] bad_user|5 years ago|reply

I have my own self-hosted Matomo instance [1].

Via Docker & docker-compose it's quite easy to install and keep up to date and Matomo is open source, well maintained, very well behaved and pretty hands off.

And I configured it on my websites with cookies turned off [2] and with IP anonymization [3]. In such an instance you don't need consent, or even a cookie banner, because you're not dropping cookies, or collecting personal info. Profiling visitors is no longer possible, but you still get valuable data on visits.

Note that if you want to self-host Matomo, you don't need more than a VPS with 1 GB of RAM (even less but let's assume significant traffic) so it's cheap to self host too.

And I disagree with another commenter here saying Analytics is just for vanity. That's not true — even for a personal blog analytics are useful to see which articles are still being visited and thus need to be kept up to date, or in case content is deprecated, the least you could do is to put up a warning.

And if you write that blog with a purpose (e.g. promoting yourself or your projects) then you need to get a sense of how well your articles are received. You can't do marketing without a feedback loop.

[1] https://matomo.org/

[2] https://matomo.org/faq/general/faq_157/

[3] https://matomo.org/docs/privacy/

[+] Carpetsmoker|5 years ago|reply

> And I disagree with another commenter here saying Analytics is just for vanity. That's not true — even for a personal blog analytics are useful to see which articles are still being visited and thus need to be kept up to date, or in case content is deprecated, the least you could do is to put up a warning.

Some examples: I maintained a Vim ChangeLog for a while (which is quite some work), and turned out no one was reading that, so ... why bother?

In another case, I wrote an article about "how to detect automatically generated emails" and I thought it wasn't actually that interesting and no one read it so considered archiving it, but turned out quite a few people end up there through Google searches etc. and I ended up updating it instead of archiving it, as it was clearly useful to people.

[+] chrismorgan|5 years ago|reply

I self-hosted Matomo for a year and a half (and took over the AUR package for it and improved it in the process). It was no trouble to run, but I ended up uninstalling it late last year, for a few reasons: its interface is painfully slow (and that’s nothing to do with my 1GB/1 vCPU VPS—I’ve interacted with a decent-sized instance at innocraft.cloud and it was similar), and I seldom looked at it, and I couldn’t think of any way in which anything I found in the analytics would change my behaviour, and server-side analytics are good enough (better on some ways, worse in others), and I value speed. So all up, I figured: why am I slowing all my users down with this 50KB of JavaScript (of which I frankly need less than 1KB), and why am I keeping this software going?

So now I pull out GoAccess (which reads the server logs) from time to time. I find that my Atom feed is the vast majority of traffic to my site, which Matomo couldn’t tell me. I should implement pagination on the feed and see if that helps. (Or limit the number of items in the feed, but conceptually I rather like everything being accessible from the feed. Wonder how many feed readers support pagination?)

[+] mritchie712|5 years ago|reply

Our SaaS runs on Google App Engine and sending the logs to BigQuery only takes a couple clicks[1]. From there you can write SQL to summarize the data by referrer, page viewed, etc. Here[0] is a starting point, though you'll need update the `WHERE` clause so it works for your use case.

You get IP and user agent in those logs if you want to roughly track visit to conversion metrics.

0 - https://gist.github.com/mike-seekwell/83ac75c82a943e287a7abe...

1 - https://cloud.google.com/appengine/docs/standard/python/logs

[+] thomasahle|5 years ago|reply

> And I configured it on my websites with cookies turned off [2] and with IP anonymization [3].

Do you have a way to filter out your own visits in this case? On small pages I find that my own clicks and events during testing contaminates the statistics.

[+] GordonS|5 years ago|reply

> And I configured it on my websites with cookies turned off [2] and with IP anonymization [3]. In such an instance you don't need consent, or even a cookie banner, because you're not dropping cookies, or collecting personal info. Profiling visitors is no longer possible, but you still get valuable data on visits.

Does this mean each page hit cannot linked to be any other? For example, can I see that a visitor viewed a particular sequence of pages?

[+] bscphil|5 years ago|reply

> And I disagree with another commenter here saying Analytics is just for vanity. That's not true — even for a personal blog analytics are useful to see which articles are still being visited and thus need to be kept up to date, or in case content is deprecated, the least you could do is to put up a warning.

I mean, I'm not making the argument that analytics are useless, but this seems like the worst possible example. You can do this trivially with a script to analyze your server (e.g. Apache) logs. And you don't need "a VPS with 1 GB of RAM" for that - which is four times the RAM of the VPS my personal website has run on for the last half decade.

This approach also uses no client-side javascript to collect data, so you wouldn't have to alarm users with potential privacy threats, because nothing is stored other than what's in the HTTP headers.

[+] boromi|5 years ago|reply

How do you actually host this? Do I need a VPS or is more of like a Heroki thing?

[+] jwr|5 years ago|reply

I turned off Google Analytics, because I realized that it doesn't actually report any useful or actionable data, just vanity metrics, and many of them of dubious quality.

I run a SaaS and what matters for me is paid subscriptions. "Visits" (even if by humans, which is hard to tell) really do not matter much. Yes, I do want to increase conversion rates, and run bandit experiments, but I'm better off doing that myself.

What also matters are search terms, but Google's search console (or tools, or whatever it's called this week) provides that.

Turning off Google Analytics was hard to do psychologically — the Fear Of Missing Out is strong. But it turns out I'm not missing out on anything, except some dubious vanity data. And I'm making the web a better place in the process.

[+] sjwright|5 years ago|reply

I run a reasonably large website and about two years ago it dawned on me that I never checked Google Analytics. It was completely useless. It wasn't telling me anything useful. I also knew that it was marginally user hostile (or at least perceived as such) and affecting page performance, even if only slightly.

Removing it felt momentous and insane. But in November 2018 I finally plucked up the courage and removed it. The crazy thing is, until this article appeared on the top of Hacker News reminded me, I had completely forgotten that I had removed it. Far from the world ending, it turned out to be the most inconsequential thing imaginable.

(I remember pouring over web server logs in Analog and AWStats 15+ years ago. Now I honestly can't remember why. I think it was some combination of vanity... and because everyone else was doing it. I suspect for most web developers GA was just the natural evolution of that muscle memory.)

[+] JackWritesCode|5 years ago|reply

GA and AWStats are both awful products for a lot of people. For us, we check out Fathom dashboard daily to see referrers and popular content. And vitality (right now we can see a ton of traffic coming from HN). When I used GA, I never checked it.

[+] jbrooksuk|5 years ago|reply

I've been happily using Fathom Analytics: https://usefathom.com and I have zero complaints.

No tracking. Privacy focused. Lightweight. You embed from your own domain. They even do site monitoring now!

[+] Diesel555|5 years ago|reply

I run fathom self hosted. I don’t love it and am looking for an alternative. But this is because they don’t update the self hosted version.

I get it that I’m asking for a free service, I just kind of wish they never offered it if they were going to ditch it. I don’t make money off my sites. I wish they had a less than X income version to self host. Oh well.

If you are making money and willing to pay I have a feeling Fathom is great.

[+] justinclift|5 years ago|reply

Fathom started out with much fanfare as being Open Source, but as soon as their paid service gained enough traction they went proprietary only.

Can't recommend a company which pulls crap like that. :(

[+] GordonS|5 years ago|reply

> No tracking

Personally, I think that Fathom strikes a good balance between privacy and usability, but it does still use tracking (or at least it did when I was looking at it a few weeks back) - the difference is that it uses fingerprinting instead of cookies. I think it's implemented in a privacy-focused way, but it does look like they are ignoring some of the EU ePrivacy guidance, which explicitly states that consent should be obtained before using fingerprinting, even if PII can't be reverse-engineered from the fingerprint.

As I say, I think their implementation makes a lot of sense, and even as a privacy advocate myself I think those particular pieces of ePrivacy guidance focused on fingerprinting is excessive. But the EU doesn't seem to agree.

[+] lhdj|5 years ago|reply

My company switched to Fathom from GA about 4 days ago.

We build privacy software so it felt slightly hypocritical to use a privacy-intrusive service like GA. So far so good.

I went from 0 to Fathom in under 20 mins and for our basic requirements it works really well .

Good job Fathom team :)

[+] spockz|5 years ago|reply

From the site:

> Our on-demand, auto-scaling servers will never slow your site down. Our tracker file is served via our super-fast CDN, with endpoints located around the world to ensure fast page loads.

This suggests that this solution is not self hosted. Is there a solution like this which is really self hosted? This service is one small change away from actually tracking.

Edit: Piwik/Matomo[1] appears to be the most mature one. [1]: https://matomo.org/

[+] JackWritesCode|5 years ago|reply

Thanks James. You wait till we launch V3 ;)

[+] Longwelwind|5 years ago|reply

I've been using a similar tool: https://simpleanalytics.com/.

I wish they'd offer more plans between the first 2 cheapest, though. My open-source project is hitting the basic plan limits and the next offer is too expensive for me.

[+] remux|5 years ago|reply

Some weeks ago I discovered fathom and I am fully satisfied with it.

[+] kasbah|5 years ago|reply

I recently had a discussion about the interface with the Goatcounter developer [1]. Also put in a feature request with Posthog [2]. Hadn't heard of Plausible, maybe that's the one for me!

[1]: https://github.com/zgoat/goatcounter/issues/302

[2]: https://github.com/PostHog/posthog/issues/1020

[+] Hoasi|5 years ago|reply

> Hadn't heard of Plausible, maybe that's the one for me!

Plausible is pretty good, found it useful to monitor traffic and usage for small projects.

[+] AdriaanvRossum|5 years ago|reply

Thanks for mentioning Simple Analytics [1]. We are at this point indeed only cloud based. We believe we need to make a business case/profit first before putting a lot of extra work in a open source version and maybe failing with the business. It's a dream to make it open source, but not at this time.

We are very firm on our values. We will never sell your data. We have many ways to get your raw data out of our system (API, download links, ...).

Our collection script [2] is open source and today we are also adding source maps to our public scripts. Open source does not guarantee that a business runs that same software as their cloud based option. We are looking into services that can validate what we collect on our servers. We never collect any IPs of personal data [3].

Great to see more products that care about privacy, I hope they will really care and commit to their values for a long time.

[1] https://simpleanalytics.com

[2] https://github.com/simpleanalytics/scripts

[3] https://docs.simpleanalytics.com/what-we-collect

[+] joppy|5 years ago|reply

What kind of server-side analytics are people using today, for personal blogs and things? Projects like GoAccess which eat an nginx log file and output some analytics seem like a nice middle ground for those of us who want some feedback on how people are using a website, without needing all the bells and whistles of something more like Google Analytics (not to mention the fact that it doesn't need any Javascript loaded or anything). Personally I've found GoAccess pretty good, but the interface a little difficult to use and understand, so I'm looking for projects like it.

[+] PaulRobinson|5 years ago|reply

Server side was how it was always done back in the early days of the web, and analog[0] was state of the art.

Around 1999/2000 there was a rise of ISPs needing to install reverse proxy caches because the growth of consumer access meant they were getting seriously contended on upstream access. I was working at the time at a UK 0845 white label ISP called Telinco (was behind Connect Free, Totalise, Current Bun and other 0845 ISPs), and to my knowledge we were the first in the UK to install a Netapps cache. It was the moment we realised (by checking the logs to see if it was working), just how much porn our customers were accessing.

Those caches blow server side analytics to pieces, because frequently you wouldn't even know the user had hit the page. What server side analytics was useful for is what we'd now call Observability: they gave reasonable Latency, Error Rate and Throughput metrics, which combined with some other system logs might also give you a sense of Saturation.

As such, they were not too useful for marketing. Google Analytics was the first product that allowed high fidelity analytics even if reverse proxy caches (and even browser caches), were all over the place.

And here we are. In a World where we are tightly surveilled by corporate entities in order to try and get us to click on things. Bit sad really.

I'd encourage people to think about what they need these analytics for.

If it's marketing, you might just as well using GA: it's the best product out there. We just need to lobby for better regulation (at least GDPR and cookie setting popovers give us choices on that regard now).

If you're stroking your ego, consider whether such an invasive technology is worth the price, and if you need those numbers.

If you're making sure your infrastructure can handle the traffic, use server side analytics alone. Parse your logs using the huge number of tools out there able to do that in near-realtime, and leave your users' browsers free of tracking cookies and javascript.

[0] https://www.web42.com/analog/

[+] dig1|5 years ago|reply

Webalizer [1] can be alternative. No fancy UI, but gets job done. For anything heavier and serious, ELK [2].

[1] http://www.webalizer.org/

[2] https://www.elastic.co/what-is/elk-stack

[+] stephenr|5 years ago|reply

I've setup GoAccess for a client's site, the problem is it doesn't have a great HA solution.

You either ship all your logs to one place (and hope that place doesn't go offline) or ship your logs to multiple places and hope both destinations are in sync. We've opted for #2 right now (hint: it's not perfect) but it's made me think about writing an alternative.

Rather than shipping all the logs all around, my plan is to have each source (i.e. web server) run a process on it's own logs, and use something like Redis to store the aggregated statistics.

[+] srg0|5 years ago|reply

TIL about European Union Public License:

https://joinup.ec.europa.eu/collection/eupl/introduction-eup...

OSI-certified, copyleft, non-viral, GPL-compatible, SaaS-aware, multilingual

[+] Hitton|5 years ago|reply

I'm kinda confused about that, because https://www.gnu.org/licenses/license-list.html#EUPL-1.2 seems to say that it's possible to relicense the source to GPL which would go directly against Goatcounter's author who apparently wanted AGPL-ish license without ideological fluff.

[+] swyx|5 years ago|reply

how can it be both copyleft and non viral? isn't vitality a definitive feature of copy left?

[+] huhtenberg|5 years ago|reply

If you decide to migrate off GA, there's very little reason to not use self-hosted analytics.

The only case when you'd get better analytics from a _service_ is exactly a GA-like setup that can track people as they go from one website to another. That is, the real value of an analytics service is derived directly from its ability to invade people privacy, at scale.

Granted, migrating to another service is usually simpler, but it offers NO insights into the traffic that you can't get from parsing server logs and in-page pingbacks. You do however get a 3rd party dependency and a subscription fee.

[+] bttrfl|5 years ago|reply

Server logs only tell you about things that happen on your server. If you are using JavaScript it's likely there are plenty of events that might be valuable to you that never leave a trace in your logs.

For example, if you validate forms with JS you might want to track form submissions and validation errors.

[+] elondaits|5 years ago|reply

My reason is the server not being able to handle the traffic. We used Piwik but I couldn't trust it'd be able to handle big eventual spikes of traffic (which the site itself could, being static and on a CDN) or that it wouldn't slow the site down (if I remember correctly I had the option to call piwik asynchronously and not slow down the site, but at risk that it'd be less accurate if people closed the window / navigated to another page quickly.).

Of course you can run your own analytics on AWS or similar and have no issues with handling traffic, but that means higher costs / difficulty in setting up and maintaining it.

[+] lmkg|5 years ago|reply

> that can track people as they go from one website to another

Note that even in Google Analytics, this requires extra set-up, has limitations, and tends to be pretty fragile in practice. GA identifies users by a first-party cookie and tracking cross-site visits requires decorating links with cookie values.

If you're interested just in aggregate traffic from one of your sites to another, rather than something that requires full-path analysis (like marketing attribute), then you can get that from looking at referrers. This should be more-or-less equally available in GA and server logs.

[+] jedimastert|5 years ago|reply

> If you decide to migrate off GA, there's very little reason to not use self-hosted analytics.

My personal domain[0] was taken by domain squatters (forgotten bill in debit card shuffle, bought up within seconds of expire) so for now I have to host on github.io. Thoughts on an analytics service?

[0]: http://www.aarontag.com/

[+] bryanrasmussen|5 years ago|reply

>The only case when you'd get better analytics from a _service_ is exactly a GA-like setup that can track people as they go from one website to another.

I was once making a service that provided cross site widgets for companies to embed. Obviously it was beneficial to track people as they go from one website to another, but at that point it was beneficial to do it with our own service.

[+] dclusin|5 years ago|reply

I use GoAccess. It's an offline access.log analytics engine. One feature it has is to generate static site from its db. I have an hourly cron script that picks up the last hours logs and generates a static site. You can see it in action at https://www.clusin.com/analytics/

1 - https://github.com/allinurl/goaccess

[+] netcan|5 years ago|reply

Tangential but...

"Analytics" is rarely useful or unuseful because of the tool. These tools need to be treated as data collection, not reporting.

If your goal is to inform certain decisions, track success or identify problems... a spreadsheet (or napkin) is usually where that happens.

Say you do analysis systematically, make a list of questions and use your tools to answer them... usually you find that the tool itself doesn't matter much, and GA doesn't answer most of your questions out-of-the-box anyway.

Say you want a "funnel." That usually consists of a handful of data points. GA usually doesn't have them by default, without tinkering configuration, etc. Decide what they are beforehand. Understand them. Use GA (or whatever) to get the data.

Finding the tool for the job is much easier once you know what the job is. GA is extremely noisy, bombarding users with half-accurate, half-understood reports.

[+] Cenk|5 years ago|reply

I’ve been pretty happy with Matomo (formerly Piwik), especially their non-cookie mode. But the interface is ugly, confusing, and makes finding information much more difficult than Google Analytics does.

Edit: One major thing I am unhappy with in Matomo is event tracking. GA makes it much easier (in my experience) to track conversions and events, and presents the data in a better way.

[+] neilsimp1|5 years ago|reply

Off topic, but I used to run a website that had Google Analytics. This site and domain are now 100% down and have been for over a year.

I still get monthly emails from Google about the analytics for this website. Apparently it's getting 200-300 visitors per month still. I have replied back to Google vie email about this several times but never heard any reply. I wonder what site they are tracking?

[+] devalnor|5 years ago|reply

I use Ackee, also a simple and open source alternative https://ackee.electerious.com/

[+] ilovefood|5 years ago|reply

I've recently whipped up my own self-hosted analytics solution [0] based on SQLite, Bash and Metabase. It's all self hosted, easy to install and very flexible with regards to the queries you can write and display. Metabase comes with a lot of cool features for display, live reload and other cool stuff. :)

[0]: https://funnybretzel.com/self-hosted-analytics-using-sqlite-...

[+] nuccy|5 years ago|reply

I'm honestly curious, are all the analytics tools, which rely on making third party queries, still efficient with extensive use of adblocking these days?

If not, then logs of webservers are the only 100% reliable place (if available of course), so old-style tools like awstats, Webalizer, etc [1] should have a rise in popularity again.

[1] https://en.wikipedia.org/wiki/List_of_web_analytics_software

[+] tannhaeuser|5 years ago|reply

I really hope an analytic genius can come up with a technique (like differential privacy, but I'm no expert here) that would give advertisers what they want (unique visitor counts, and very few other metrics) to place ads on sites, yet doesn't give away too much privacy, nor leads to enslavement under a single central entity. I guess if something like that doesn't come along, then only old school content-based ads (site sponsoring) without any tracking can be considered ethical (or no-ads of course). The argument against content-based ads was always that it doesn't suffice to finance even web hosting let alone content production. But with ad prices going to the bottom, I wonder if the figures still add up in favour of targetted ads today.

[+] lmkg|5 years ago|reply

Apple/Webkit already has a proposal along those lines.

https://webkit.org/blog/8943/privacy-preserving-ad-click-att...

[+] ckotso|5 years ago|reply

Snowplow is also an option. It’s an open-source data collection solution that, unlike GA, gives you full ownership of your event-level data and the freedom to define your own data structures. Not exactly what you’d call ‘lightweight’ but quite a few Snowplow users/customers have come from GA for the level of flexibility and control they can have over their data sets.

(Full disclosure: I work for Snowplow Analytics)

- https://github.com/snowplow/snowplow

- https://snowplowanalytics.com/

[+] TomGullen|5 years ago|reply

I still can't see any solid reasons why a site owner would not use GA.

Other products:

- Objectively lack features

- Potentially incur extra costs in money/time

- May be a small barrier in m&a

- May carry additional risks/attack vectors if self hosted

Trying to ween off big tech is commendable, but likely detrimental to a business.

Relatively high risk, low reward.

I'm happy to have my mind changed. I can see a case for user hostility, but most sites I imagine don't have an audience sensitive to this at the moment anyway.

From an idealogical standpoint, other cloud stat tracking services would only function if not many people used them. And I would also imagine feature creep would be inevitable and lead them to becoming an inferior version of GA.

[+] XCSme|5 years ago|reply

Some issues with GA version going self-hosted: - Privacy of your users: For a specific user, Google knows all the website he visits - Privacy your data: If Google knows the visitors of most websites, your competitors can leverage that advantage (using Google Ads for example) to steal your potential customers. - Google Analytics is bloated and slow (both in terms of the tracking script and the dashboard UI, where it takes several seconds for each graph/page to load). - You don't own your data, at any point Google can, even though unlikely to, block your account (for breaking ToS of some other service of theirs) and you lose all your data. - If everyone uses GA, it will become (already is) an analytics monopoly, which has many other drawbacks (lack of innovation for example).

I do think that for the average user, using GA might be fine because it's free, easy to set-up and does its job. That is unless they care about all the possible consequences.

[+] Carpetsmoker|5 years ago|reply

GA alternatives are a fairly new thing. When I looked at this in May last year there was essentially only one alternative: Matomo. It seems some sort of "critical mass" has been reached, and in the last year quite a few people have independently started working on alternatives.

I agree for many features are still lacking, but as a counter-argument 1) not everyone needs those features (not every product needs to solve 100% of the use cases), and 2) a lot of these products are still quite new, and are actively working on adding a number of those features.

[+] gorkemcetin|5 years ago|reply

Countly [1] is another open source alternative to Google Analytics - suggest you try it on Digital Ocean [2] or deploy on your own [3].

It is self hosted, has support for desktop apps, mobile apps and web apps at the same time.

[1] https://count.ly

[2] https://marketplace.digitalocean.com/apps/countly-analytics

[3] https://github.com/countly/countly-server

319 comments