This is not for privacy. It is done for the sellers/buyers of PII, buyers do not want to buy data they already own and the seller doesn't want to disclose data before they sell it.
Yeah if you want to check if user is in someones else database you ask the user if the check can be performed. Then you will have the check already done if user doesn't agree even if he is in the other database it is not for you to make that check.
Serious private set intersection uses full homomorphic encryption or equivalent mechanisms. Microsoft Edge's compromised password detection uses FHE, for instance:
That said, surprisingly few people are aware of this fact, even senior technical leadership at Big Tech companies, so I'm not surprised dodgy Ad-Tech companies are not either, and it might be an illustration of Hanlon's Razor: do not ascribe to malice what can be better explained by incompetence (even if ad-tech companies long ago forfeited the benefit of doubt).
The company I work for has a similar, yet even worse instance of this. The employee satisfaction survey was advertised as anonymous, but when I looked into the implementation they were just hashing the email address, of which there were only a few thousand. A more conspiratorial mind would conclude that it is to easily be able to find who a particular piece of feedback came from, but in this case I legitimately think it's just incompetence and not being able to figure out a better way of ensuring each employee can only submit the survey once.
This year it's advertised as confidential, rather than anonymous, so I suppose that is an improvement.
Not calling it anonymous is an improvement. Before I retired, I read many "anonymous" surveys taken by my reports. Any free-form text in the survey that goes beyond a sentence fragment usually made it obvious who wrote it. At least in the case of my teams, writing styles tended to be pretty distinct, as were the things each person cared about enough to write at any length. I tried to ignore the clues, but it was usually so obvious that it jumped out at me. The people administering such things insisted that anonymous meant their name wasn't on it, so it was fair to call it that.
Yea, this is pretty annoying and not the only problem in this field. There's a bunch of theather or misunderstanding in the marketing space. I feel like marketing people just don't get it. They seem to be hopelessly incapable to accepting that matching people in whatever way possible is the exact practice the laws like GDPR are trying to target. You cannot go around it by hashing, fingerprinting, ad ids, cookieless matching or whatever.
They’re heavily incentivised to not get it, both internally with company KPI’s that’ve not kept pace with the reality of GDPR and externally through ad platforms that continue to demand excessive amounts of data without providing suitable alternatives.
I also think that many vendors in this space are abusing the fact that marketers are not technical people, so they just wave around some "we're GDPR ready", "anonymized data" slogans such that marketers feel that they can tick the "GDPR" box and get all the metrics they are used to.
While of course not realising that GDPR implementation is partially on them and that some of those metrics are literally impossible to implement without breaching into GDPR territory. Any company saying that they are "fully GDPR compliant" but also giving you retention and attribution metrics by default is probably confusing you in this way.
This is how I did it. You generate a salt per logging context and combine with the base into a sha2 hash. The idea is that you ruin the ability to correlate PII across multiple instances in different isolated activities. For example, if John Doe opened a new account and then added a co-owner after the fact, it wouldn't be possible for my team to determine that it was the same person from the perspective of our logs.
This isn't perfect, but there hasn't been a single customer (bank) that pushed back against it yet.
Salting does mostly solve the problem from an information theory standpoint. Correlation analysis is a borderline paranoia thing if you are practicing reasonable hygiene elsewhere.
If it's salted, you can't share it with a third-party and determine who your customers in common are. (That's the point of the salt; to mean that my_hash(X) != your_hash(X)).
> A 2020 MacBook Air can hash every North American phone number in four hours
If you added a salt, this would still allow you to reverse some particular hashed phone number in about 4 hours, it just wouldn't allow you to do all of them at the same time.
For me it seems like cracking hashes is irrelevant in grand scheme of things.
All the laws were passed so that companies don't not compare their customer lists without asking the customer first.
I hope some government agency picks that up and strikes such BS with might.
If you are BambooHR customer having people in your HR system - you have to ask person if you can check if they are up in BambooHR, guess what if they say no or yes you already have half of the job done.
Putting it into a hash and seeing if you have it in your database is still sharing that requires consent. Fuckers.
blitzar|4 months ago
There is no honour amongst data thieves.
ozim|4 months ago
fmajid|4 months ago
https://www.microsoft.com/en-us/research/blog/password-monit...
If anything, this article understates the problem. A single Nvidia RTX4090 can calculate 164 billion MD5 hashes per second running hashcat software:
https://gist.github.com/Chick3nman/32e662a5bb63bc4f51b847bb4...
That said, surprisingly few people are aware of this fact, even senior technical leadership at Big Tech companies, so I'm not surprised dodgy Ad-Tech companies are not either, and it might be an illustration of Hanlon's Razor: do not ascribe to malice what can be better explained by incompetence (even if ad-tech companies long ago forfeited the benefit of doubt).
nevon|4 months ago
This year it's advertised as confidential, rather than anonymous, so I suppose that is an improvement.
rented_mule|4 months ago
rdtsc|4 months ago
panstromek|4 months ago
iamacyborg|4 months ago
panstromek|4 months ago
While of course not realising that GDPR implementation is partially on them and that some of those metrics are literally impossible to implement without breaching into GDPR territory. Any company saying that they are "fully GDPR compliant" but also giving you retention and attribution metrics by default is probably confusing you in this way.
FooBarBizBazz|4 months ago
bob1029|4 months ago
This isn't perfect, but there hasn't been a single customer (bank) that pushed back against it yet.
Salting does mostly solve the problem from an information theory standpoint. Correlation analysis is a borderline paranoia thing if you are practicing reasonable hygiene elsewhere.
hlieberman|4 months ago
jstanley|4 months ago
If you added a salt, this would still allow you to reverse some particular hashed phone number in about 4 hours, it just wouldn't allow you to do all of them at the same time.
chrisandchris|4 months ago
ozim|4 months ago
All the laws were passed so that companies don't not compare their customer lists without asking the customer first.
I hope some government agency picks that up and strikes such BS with might.
If you are BambooHR customer having people in your HR system - you have to ask person if you can check if they are up in BambooHR, guess what if they say no or yes you already have half of the job done.
Putting it into a hash and seeing if you have it in your database is still sharing that requires consent. Fuckers.
meindnoch|4 months ago
Nextgrid|4 months ago