top | item 46593602

(no title)

cc62cf4a4f20 | 1 month ago

It's really quite amazing that people would actually hook an AI company up to data that actually matters. I mean, we all know that they're only doing this to build a training data set to put your business out of business and capture all the value for themselves, right?

discuss

order

simonw|1 month ago

A few months ago I would have said that no, Anthropic make it very clear that they don't ever train on customer data - they even boasted about that in the Claude 3.5 Sonnet release back in 2024: https://www.anthropic.com/news/claude-3-5-sonnet

> One of the core constitutional principles that guides our AI model development is privacy. We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so.

But they changed their policy a few months ago so now as-of October they are much more likely to train on your inputs unless you've explicitly opted out: https://www.anthropic.com/news/updates-to-our-consumer-terms

This sucks so much. Claude Code started nagging me for permission to train on my input the other day, and I said "no" but now I'm always going to be paranoid that I miss some opt-out somewhere and they start training on my input anyway.

And maybe that doesn't matter at all? But no AI lab has ever given me a convincing answer to the question "if I discuss company private strategy with your bot in January, how can you guarantee that a newly trained model that comes out in June won't answer questions about that to anyone who asks?"

I don't think that would happen, but I can't in good faith say to anyone else "that's not going to happen".

For any AI lab employees reading this: we need clarity! We need to know exactly what it means to "improve your products with your data" or whatever vague weasel-words the lawyers made you put in the terms of service.

usefulposter|1 month ago

This would make a great blogpost.

>I'm always going to be paranoid that I miss some opt-out somewhere

FYI, Anthropic's recent policy change used some insidious dark patterns to opt existing Claude Code users in to data sharing.

https://news.ycombinator.com/item?id=46553429

>whatever vague weasel-words the lawyers made you put in the terms of service

At any large firm, product and legal work in concert to achieve the goal (training data); they know what they can get away with.

brushfoot|1 month ago

To me this is the biggest threat that AI companies pose at the moment.

As everyone rushes to them for fear of falling behind, they're forking over their secrets. And these users are essentially depending on -- what? The AI companies' goodwill? The government's ability to regulate and audit them so they don't steal and repackage those secrets?

Fifty years ago, I might've shared that faith unwaveringly. Today, I have my doubts.

hephaes7us|1 month ago

Why do you even necessarily think that wouldn't happen?

As I understand it, we'd essentially be relying on something like an mp3 compression algorithm to fail to capture a particular, subtle transient -- the lossy nature itself is the only real protection.

I agree that it's vanishingly unlikely if one person includes a sensitive document in their context, but what if a company has a project context which includes the same document in 10,000 chats? Maybe then it's more much likely that whatever private memo could be captured in training...

postalcoder|1 month ago

I despise the thumbs up and thumbs down buttons for the reason of “whoops I accidentally pressed this button and cannot undo it, looks like I just opted into my code being used for training data, retained for life, and having their employees read everything.”

TeMPOraL|1 month ago

> I mean, we all know that they're only doing this to build a training data set

That's not a problem. It leads to better models.

> to put your business out of business and capture all the value for themselves, right?

That's both true and paranoid. Yes, LLMs subsume most of the software industry, and many things downstream of it. There's little anyone can do about it; this is what happens when someone invents a brain on a chip. But no, LLM vendors aren't gunning for your business. They neither care, nor have the capability to perform if they did.

In fact my prediction is that LLM vendors will refrain from cannibalizing distinct businesses for as long as they can - because as long as they just offer API services (broad as they may be), they can charge rent from an increasingly large amount of the software industry. It's a goose that lays golden eggs - makes sense to keep it alive for as long as possible.

falloutx|1 month ago

Its impossible to explain this to the business owners, giving a company this much access cant end up well. Right now, Google, Slack, Apple have a share of the data but with this Claude can get all of that.

cc62cf4a4f20|1 month ago

We've seen this playbook with social media - be nice and friendly until they let you get close enough to stick the knife in.

simonw|1 month ago

Is there a business owner alive who doesn't worry about AI companies "training on their data" at this point?

They may still decide to use the tools, but I'd be shocked if it isn't something they are thinking about.

bearjaws|1 month ago

This is the AI era equal to "I can't share my ideas because you will steal them"

Reality is good ideas and a few SOPs do not make a successful business.

eZinc|1 month ago

It's either that, or you are 100X slower for not using Claude Code. The manpower per hour savings are most likely more worth it than protecting some inputs.

You could also always run a local LLM like GLM for sensitive documents or information on a separate computer, and never expose that to third party LLMs.

You also need to remember that if you hire regular employees that they are still untrustworthy at a base level. There needs to be some obfuscation anyway since they can steal your data/info too as a human. Very common case especially when they run off to China or something to clone your company where IP laws don't matter.