top | item 31189133

(no title)

evoxmusic | 3 years ago

Do you consider transformed data in staging harmful? (Transformed data = where all the sensitive data have been hidden)

discuss

order

onion2k|3 years ago

I consider it potentially harmful. Anonymizing data is a hard problem, and what is considered sensitive is not settled. For example, an IP address is personal identifiable information under the GDPR. Most people don't mask that in their logs though. If you copy records from production that have network information in them (last known IP for example) then your data controller should be very concerned.

Another major problem with tools like replibyte is that people use them properly, and then a database schema changes, but people don't update their script to anonymize new tables or columns. Then a few months later someone notices sensitive data has made its way in to staging, and into the backups, and the database dumps devs made to debug things because "it's only staging data, who cares!"

Protecting user data is something that you need to be extremely vigilant about. In my experience, the less access I have to production data the happier I am. Copying it and using it in staging, even if you're careful about it, fills me with dread.

evoxmusic|3 years ago

It makes sense to me and that's why:

  1. Auto-detection of sensitive data is planned

  2. Detecting database schema change is also plan to prevent leaking sensitive data.
RepliByte responds to a very common need that almost every company end to build internally. The idea is to collaboratively work on a tool that can be used by anyone and that can be improved to avoid leaking data.