top | item 40389771

(no title)

As an engineer who has worked on systems that handle sensitive data, it seems straightforwardly to me to be a statement about:

1. ACLs

2. The systems that provision those ACLs

3. The policies that determine the rules those systems follow.

In other words, the model training batch job might run as a system user that has access to data annotated as 'interactions' (at timestamp T1 user U1 joined channel C1, at timestamp T2 user U2 ran a query that got 137 results), but no access to data annotated as 'content', like (certainly) message text or (probably) the text of users' queries. An RPC from the training job attempting to retrieve such content would be denied, just the same as if somebody tried to access someone else's DMs without being logged in as them.

As a general rule in a big company, you the engineer or product manager don't get to decide what the ACLs will look like no matter how much you might feel like it. You request access for your batch job from some kind of system that provisions it. In turn the humans who decide how that system work obey the policies set out by the company.

It's not unlike a bank teller who handles your account number. You generally trust them not to transfer your money to their personal account on the sly while they're tapping away at the terminal--not necessarily because they're law abiding citizens who want to keep their job, but because the bank doesn't make it possible and/or would find out. (A mom and pop bank might not be able to make the same guarantee, but Bank of America does.) [*]

In the same vein, this is a statement that their system doesn't make it possible for some Slack PM to jack their team's OKRs by secretly training on customer data that other teams don't use, just because that particular PM felt like ignoring the policy.

[*] Not a perfect analogy, because a bank teller is like a Slack customer service agent who might, presumably after asking for your consent, be able to access messages on your behalf. But in practice I doubt there's a way for an employee to use their personal, probably very time-limited access to funnel that data to a model training job. And at a certain level of maturity a company (hopefully) also no longer makes it possible for a human employee to train a model in a random notebook using whatever personal data access they have been granted and then deploy that same model to prod. Startups might work that way, though.

discuss

No comments yet.