top | item 45640265

(no title)

FooBarBizBazz | 4 months ago

Isn't this solved with salt?

discuss

order

bob1029|4 months ago

This is how I did it. You generate a salt per logging context and combine with the base into a sha2 hash. The idea is that you ruin the ability to correlate PII across multiple instances in different isolated activities. For example, if John Doe opened a new account and then added a co-owner after the fact, it wouldn't be possible for my team to determine that it was the same person from the perspective of our logs.

This isn't perfect, but there hasn't been a single customer (bank) that pushed back against it yet.

Salting does mostly solve the problem from an information theory standpoint. Correlation analysis is a borderline paranoia thing if you are practicing reasonable hygiene elsewhere.

hlieberman|4 months ago

If it's salted, you can't share it with a third-party and determine who your customers in common are. (That's the point of the salt; to mean that my_hash(X) != your_hash(X)).

OutOfHere|4 months ago

You actually can join it when the salt provider is a dedication shared entity. The entity rehashes the data of both organizations to use a shared salt. That is how different organizations join hashed data.

jstanley|4 months ago

> A 2020 MacBook Air can hash every North American phone number in four hours

If you added a salt, this would still allow you to reverse some particular hashed phone number in about 4 hours, it just wouldn't allow you to do all of them at the same time.

OutOfHere|4 months ago

I do not agree. How will you reverse a salt with sufficient entropy? Imagine the salt is a 512 bit hex, the data is a ten decimal digit phone number, the generated hash is 512 bits of which the first 160 bits are used as the value. Now exactly how will you get the phone number back? Do you really think you can iterate over half of the possibilities of 512 bits in four hours?

chrisandchris|4 months ago

A salt is very good if the input varies. If the input stays within a pre-defined range (e.g. phone numbers), salt does not work very well.

OutOfHere|4 months ago

I do not agree that it doesn't work very well. How will you reverse a salt with sufficient entropy? Imagine the salt is a 512 bit hex, the data is a nine decimal digit SSN, the generated hash is 512 bits of which the first 160 bits are used as the value. Now exactly how is the salt not good enough?