Ask HN: Privacy-focused or useless analytics tools?
8 points| fsenart | 4 years ago
What is a visitor in a privacy-focused analytics tool? Can we have a returning visitor when it is not tied to identity over time and across visits? How can we even interpret these numbers?
Let's summarize the ladder of identity on the web: Logged in user > Persistent identity (e.g. cookie) > Ephemeral identity (e.g. 24h hash) > no identity.
Privacy-focused tools seem to provide the last ones while promoting the same advantage as the first ones! Coolness over effectiveness?
What's all the fuss really about?
It's worth noting that the question is not about all the different kinds of statistics these tools can provide without relying on a cookie but about the legitimacy and the relevance of the visitors' related statistics (e.g., new, returning, etc.).
mtmail|4 years ago
Personally I have a background in metrics and reporting tools. I've been tasked to find and explain 0.3% differences between two reports or have cookie related (or timezone related) code getting reviewed by other engineers at previous companies. With millions of dollar at stake, Powerpoint meetings or investor or financial documention it makes sense to question every definition and the whole data pipeline.
> Coolness over effectiveness? What's all the fuss really about?
Ok, I admit, there's a bit of coolness factor. Paying $25/month to a small bootstrapped company (with a great podcast) beats feeding data to an ever growing global player (Google).
fsenart|4 years ago
XCSme|4 years ago
The privacy part, compared to other tools, comes from the fact that it's self-hosted, so no data is shared with 3rd parties, which is the best way to achieve data privacy. You can detect returning visitors in various way, an option in userTrack is to store the hash of IP + user-agent string of the visitor. It is not 100% accurate and if the visitor updates his browser or his IP changes it will be considered to be a new user. If the user is logged-in, you can tag each session with his username or user ID.
Also keep in mind that fully persistent identities rarely exist (unless the user is logged-in), as the cookies can be cleared at any point or simply be blocked/reset by the browser on each visit.
PS: I do agree that many privacy-focused tools are also not really private, because they still are a 3rd-party aggregating data across the web.
[0]: https://www.usertrack.net/
fsenart|4 years ago
Nowadays, privacy is a pretty convoluted word. I like to consider it from the point of view of the most impacted actor, the end-user. And from his perspective, you remain a third party as long as his data is concerned. The sole fact that the tool is self-hosted cannot be a guarantee of privacy. Though, it's more likely to achieve stronger privacy if the number of third parties is small.
Therefore, with your tool:
- Either you have an identity (i.e., hash(x,y,z)) that is persisted over time (notwithstanding its accuracy).
- Or you have an identity that is forgotten after a certain period of time (e.g., 24h).
In the first case, it cannot be considered a privacy-focused tool, and in the second case, it has the same shortcomings I've described in the original question.
---
It is crucial to note that the question is about the quality of users' metrics in privacy-focused tools.
There ain't no such thing as a free lunch. End-user's privacy comes at the expense of actionable metrics. Furthermore, at best, people using these tools are not aware of the shortcomings and the risk of misleading numbers. At worse, these very concerns are kept away in the marketing speeches of these tools to minimize their real impact.
Above is an opinion, and I would like to debate about it. About my possible misunderstanding of these tools. About possible solutions.
elevate_lsk|4 years ago
fsenart|4 years ago
The metric is pretty simple, the number of unique visitors. And the setup is simple, too, with no identity whatsoever. But are they compatible? This is the question.
Your approach is interesting. You take a stance: let's have precise metrics when users consent and don't consider those who didn't give their consent. This approach relies on the underlying assumption that those who do not give their consent represent a minority and thus don't have a perceivable impact on the overall statistics. And this very assumption may be false.