top | item 27076693

Ask HN: Privacy-focused or useless analytics tools?

8 points| fsenart | 4 years ago

The number of privacy-focused analytics tools is ever-growing. And the perfect balance between freshness and promoted evilness of legacy tools is appealing. Rightfully, all of them try to mitigate identity to promote privacy. And while it makes sense, at first sight, questions arise around the usefulness of the resulting statistics.

What is a visitor in a privacy-focused analytics tool? Can we have a returning visitor when it is not tied to identity over time and across visits? How can we even interpret these numbers?

Let's summarize the ladder of identity on the web: Logged in user > Persistent identity (e.g. cookie) > Ephemeral identity (e.g. 24h hash) > no identity.

Privacy-focused tools seem to provide the last ones while promoting the same advantage as the first ones! Coolness over effectiveness?

What's all the fuss really about?

It's worth noting that the question is not about all the different kinds of statistics these tools can provide without relying on a cookie but about the legitimacy and the relevance of the visitors' related statistics (e.g., new, returning, etc.).

10 comments

mtmail|4 years ago

As a small company using one of the providers (https://usefathom.com/) it offers us choice. We don't want to build and run our own analytics and we don't need the vast features of a Google Analytics or similar either. Even without the privacy benefit the set metrics we get now acceptable. We don't have, e.g. a growth manager who needs to report these, investors who question the or make decisions when one metric looks off. Legitimacy didn't come up once during our migration and we didn't look too closely to compare to our previous provider (Google Analytics). It's not the core of your questions but there's usecases where less metrics, less accuracy is good enough.

Personally I have a background in metrics and reporting tools. I've been tasked to find and explain 0.3% differences between two reports or have cookie related (or timezone related) code getting reviewed by other engineers at previous companies. With millions of dollar at stake, Powerpoint meetings or investor or financial documention it makes sense to question every definition and the whole data pipeline.

> Coolness over effectiveness? What's all the fuss really about?

Ok, I admit, there's a bit of coolness factor. Paying $25/month to a small bootstrapped company (with a great podcast) beats feeding data to an ever growing global player (Google).

fsenart|4 years ago

Thank you so much for your thorough answer. I can totally understand what you say about helping seeds grow and the "analytics fatigue" when it comes to legacy tools. I also understand that you are not much interested in how your visitor base or visit cohorts evolve. Beyond, please share if you have any hints about how you may estimate these metrics with your current tools.

XCSme|4 years ago

I am creating a "privacy-focused" analytics tool[0], that actually provides useful stats.

The privacy part, compared to other tools, comes from the fact that it's self-hosted, so no data is shared with 3rd parties, which is the best way to achieve data privacy. You can detect returning visitors in various way, an option in userTrack is to store the hash of IP + user-agent string of the visitor. It is not 100% accurate and if the visitor updates his browser or his IP changes it will be considered to be a new user. If the user is logged-in, you can tag each session with his username or user ID.

Also keep in mind that fully persistent identities rarely exist (unless the user is logged-in), as the cookies can be cleared at any point or simply be blocked/reset by the browser on each visit.

PS: I do agree that many privacy-focused tools are also not really private, because they still are a 3rd-party aggregating data across the web.

[0]: https://www.usertrack.net/

fsenart|4 years ago

Thank you very much for your comment.

Nowadays, privacy is a pretty convoluted word. I like to consider it from the point of view of the most impacted actor, the end-user. And from his perspective, you remain a third party as long as his data is concerned. The sole fact that the tool is self-hosted cannot be a guarantee of privacy. Though, it's more likely to achieve stronger privacy if the number of third parties is small.

Therefore, with your tool:

- Either you have an identity (i.e., hash(x,y,z)) that is persisted over time (notwithstanding its accuracy).

- Or you have an identity that is forgotten after a certain period of time (e.g., 24h).

In the first case, it cannot be considered a privacy-focused tool, and in the second case, it has the same shortcomings I've described in the original question.

---

It is crucial to note that the question is about the quality of users' metrics in privacy-focused tools.

There ain't no such thing as a free lunch. End-user's privacy comes at the expense of actionable metrics. Furthermore, at best, people using these tools are not aware of the shortcomings and the risk of misleading numbers. At worse, these very concerns are kept away in the marketing speeches of these tools to minimize their real impact.

Above is an opinion, and I would like to debate about it. About my possible misunderstanding of these tools. About possible solutions.

elevate_lsk|4 years ago

We at Splitbee (https://splitbee.io) are trying to solve this with a hybrid approach. We allow people to use track people without a cookie and if they can get consent later on we can stick them with a cookie. Generally it depends on what data you want to get out of your analytics tool. For a ton of websites this data is more than enough.

fsenart|4 years ago

Thank you for your insights.

The metric is pretty simple, the number of unique visitors. And the setup is simple, too, with no identity whatsoever. But are they compatible? This is the question.

Your approach is interesting. You take a stance: let's have precise metrics when users consent and don't consider those who didn't give their consent. This approach relies on the underlying assumption that those who do not give their consent represent a minority and thus don't have a perceivable impact on the overall statistics. And this very assumption may be false.