(no title)
ff7c11
|
3 years ago
Trying to think how to anonymise datetimes hurts my head. You might want to randomise the date of an event. But you also need this random date to be consistent with respect to both the current time and the order of other related rows in the database.
lstamour|3 years ago
I get that you can look up or de-anonymize an event by its timestamp and the same is true of ID numbers. But it’s worse for ID numbers because these are often permanent and re-used for multiple events.
But yeah, the risk in anonymized data is that it’s never truly both anonymous and useful. Truly anonymous data might be considered junk or random data.
Anonymized data has some utility purpose to fulfil. Perhaps “realistic” analytics is required, or you want to troubleshoot a production issue without revealing who did what to engineers. So you anonymize the fields they shouldn’t see, and create a subset of data that reproduces the issue…?
Anonymized data is almost always a bad approach compared to generating data from algorithmic or random sources, but sometimes we need anonymized or restricted data to start that process.
BobbyJo|3 years ago
A good example is: https://gretel.ai/blog/gretel-ai-illumina-using-ai-to-create...
Full disclosure, I work at Gretel, but I thought this was relevant enough to mention.