top | item 21198073

(no title)

Not a chance. It's never a single engineer, code gets the PR checked by another engineer and the Jira will be specific with any PII, probably written by committee, all of whom know the importance of the data. Don't conflate this crap with blaming a single nebulous engineer.

I've not worked in years at a place that wouldn't understand the importance of PII. Not that it doesn't happen, but let's not mince words here - this was wilfully done.

discuss

tidepod12|6 years ago

Your comment made me audibly laugh at the notion that most companies would have a committee checking PR and Jira tickets for PII. I've worked at plenty of companies, even ones at the scale of Twitter and larger, that don't approach anything even remotely close to that level of sophistication. I've seen audits uncover precisely what the GP comment is talking about. IME, it's not at all uncommon for someone to send an email saying "hey can I get a dump of usernames and phone numbers" and some naive engineer dumps it into a CSV file and sends it to whoever. Hell, most of the places I consulted at don't even consider phone numbers to be protected PII.

I don't mean to defend Twitter in any way, but I could easily see this being an oversight or a mistake.

danShumway|6 years ago

I bet if we could get a hard percentage of companies that have strict access rules for engineers around even just sensitive data in general, let alone PII, that would easily be <50%.

It's entirely feasible to me that this is was a mistake, I think people who assume this was deliberate are ironically putting more trust in tech companies than they should.

Most of the world is being held together by duck-tape, fastened by people who don't understand the systems they're fixing or maintaining. I don't think that tech companies are an exception to that rule.

eitally|6 years ago

fwiw, Google at least has policies around how to handle PII in support tickets, as well as how to handle PII in bugs reported against public-facing-ish software (like Chrome). That's not to say there can't be bad actors or lapses due to poor training or inappropriate behavior, but the tools & policies exist.

verst|6 years ago

I get your perspective and skepticism, I really do. I have no incentive to defend Twitter. I cannot say whether this was done deliberately or not, but it absolutely could have been a mistake by a single engineer at Twitter.

The JIRA will just have been something vague like "add support for phone number matching to tailored audience matching pipeline" likely created by a manager on the ads infra team. Context will have already been assumed. Given that these are simple data pipelines there likely will not have been a design document specifically calling out the fields to match against for this task.

At Twitter it was also possible to deploy these Hadoop jobs without checking in code. They would require to be run as the main ads system service accounts, but most ads engineers should have had the ability to deploy such a job.

As I mentioned earlier, the fragility of this part of the ads infrastructure I observed in 2015 makes me believe that a mistake is entirely possible here.

Example: Hadoop job writes some output file to HDFS, a different job reads files from a particular location on HDFS and processes them. If no files exist there must not have been anything to process right? But it could have also been the case the first Hadoop job failed which nobody noticed subsequently.

Anyways, it could have been an engineer by mistake, an engineer trying to get promoted and increasing revenue numbers, or an action at the direction of management. Don't rule out the first option though...