top | item 40446590

(no title)

mathisd | 1 year ago

During an internship, I was part of a team that developed a collection of tools [0] intended to provide pseudonymization of production database for testing and development purposes. These tools were developed while used in parallel with clients that had a large number of database.

Referential constraint refer to ensuring some coherence / basic logic in the output data (ie. the anonymized street name must exist in the anonymized city). This was the most time consuming phase of the pseudonymization process. They were working on introducing pseudonymization with cross-referential constraint which is a mess as constraint were often strongly intertwined. Also, a lot of the time client had no proper idea of what the field were and what they were truly containing (what format of phone number, we did find a lot of unusual things).

[0] (LINO, PIMO, SIGO, etc.) https://github.com/CGI-FR/PIMO

discuss

order

edrenova|1 year ago

yeah the referential integrity and constraints part is usually the most complicated part and everyone does things differently which adds another layer of complexity on it