I thought this was true, honestly, up until I read it just now. User data is explicitly one of the 3 training sources[^1], with forced opt-ins like "feedback"[^2] lets them store & train on it for 10 years[^3], or tripping the safety classifier"[^2], lets them store & train on it for 7 years.[^3]
"Specifically, we train our models using data from three sources:...[3.] Data that our users or crowd workers provide"..."
[^2]
For all products, we retain inputs and outputs for up to 2 years and trust and safety classification scores for up to 7 years if you submit a prompt that is flagged by our trust and safety classifiers as violating our UP.
Where you have opted in or provided some affirmative consent (e.g., submitting feedback or bug reports), we retain data associated with that submission for 10 years.
[^3]
"We will not use your Inputs or Outputs to train our models, unless: (1) your conversations are flagged for Trust & Safety review (in which case we may use or analyze them to improve our ability to detect and enforce our Usage Policy, including training models for use by our Trust and Safety team, consistent with Anthropic’s safety mission), or (2) you’ve explicitly reported the materials to us (for example via our feedback mechanisms), or (3) by otherwise explicitly opting in to training."
This is a non starter for every company I work with as a B2B SaaS dealing with sensitive documents. This policy doesn’t make any sense. OpenAI is guilty of the same. Just freaking turn this off for business customers. They’re leaving money on the table by effectively removing themselves from a huge chunk of the market that can’t agree to this single clause.
I haven't personally verified this, but I'm fairly positive all the enterprise versions of these tools (ChatGPT, Gemini, Claude) not only are oblivious to document contents but also respect things like RBAC on documents for any integration.
Given the apparent technical difficulties involved in getting insight into a model’s underlying data, how would anyone ever hold them to account if they violated this policy? Real question, not a gotcha, it just seems like if corporate-backed IP holders are unable to prosecute claims against AI, it seems even more unlikely that individual paying customers would have greater success.
Even if this were true (and not hollowed out by various exceptions in Anthropic’s T&C), I would not call it “extremely strict”. How about zero retention?
refulgentis|1 year ago
[^1] https://www.anthropic.com/legal/privacy:
"Specifically, we train our models using data from three sources:...[3.] Data that our users or crowd workers provide"..."
[^2] For all products, we retain inputs and outputs for up to 2 years and trust and safety classification scores for up to 7 years if you submit a prompt that is flagged by our trust and safety classifiers as violating our UP.
Where you have opted in or provided some affirmative consent (e.g., submitting feedback or bug reports), we retain data associated with that submission for 10 years.
[^3] "We will not use your Inputs or Outputs to train our models, unless: (1) your conversations are flagged for Trust & Safety review (in which case we may use or analyze them to improve our ability to detect and enforce our Usage Policy, including training models for use by our Trust and Safety team, consistent with Anthropic’s safety mission), or (2) you’ve explicitly reported the materials to us (for example via our feedback mechanisms), or (3) by otherwise explicitly opting in to training."
pixelsort|1 year ago
Partly why I'm building a zero-trust product that keeps all your AI artifacts encrypted at rest.
binarymax|1 year ago
phillipcarter|1 year ago
voltaireodactyl|1 year ago
saagarjha|1 year ago
anon373839|1 year ago
lazycog512|1 year ago