Extracting training data from ChatGPT

[+] nialv7|2 years ago|reply

lol I literally found the same attack months ago, posted to Reddit and nobody cared.

https://www.reddit.com/r/ChatGPT/comments/156aaea/interestin...

[+] jefftk|2 years ago|reply

Neat that you'd found it!

I think part of why people didn't care was that you didn't realize (or didn't post) that the random gibberish was verbatim training data?

[+] dr_dshiv|2 years ago|reply

Here’s another attack approach: https://chat.openai.com/share/33a5e063-b6eb-4842-a543-75f96a...

Stack overflow?

[+] upghost|2 years ago|reply

I tell you what nialv7 I feel ya. Not only that, it makes me wonder how many great things have gone unnoticed. Partly why I'm glued to HN is bc how on earth do you find these gems otherwise??

[+] startupsfail|2 years ago|reply

Same here. Its biased sampling, also my prompt had generalized from GPT4 to Google’s own model - Bard. And was directly sampling, without having to go through the state when the model produces a repeating token. At least back then.

Should be a good food for the lawsuits. Some lawsuits were based on a hallucinated acknowledgement of the model that it used some particular materials, and this was clearly nonsense. Here, this is a bit more solid ground, provided that copyrighted material can be sampled and an owner would be interested in a class action.

[+] saintradon|2 years ago|reply

I've seen something like this posted on Twitter a few times as well but it seemed to have flown under the radar for some reason.

[+] Zetobal|2 years ago|reply

I was about to link your thread but didn't find it. There was even an earlier one if you input 500 times "a".

[+] c-linkage|2 years ago|reply

The difference between screwing around and science is writing things down .... and publishing in a peer-reviewed journal.

[+] avg_dev|2 years ago|reply

i really don't doubt it... pretty interesting find though.

FTA:

> It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier.

[+] 1ark|2 years ago|reply

You need to write a paper with sophisticated words and hard to read charts to be taken seriously! /s

[+] AustinDev|2 years ago|reply

This attack still works. It hasn't been patched you just have to be a bit creative try this prompt on GPT 3.5 if you want to see how it works right now... until someone from OpenAI sees my post :D

Prompt: https://pastebin.com/Nm4jGttE

Not sure if I'm seeing training data or someone else's responses but it's odd. Here is my attempt: https://chat.openai.com/share/6b6ea43f-de2f-4ed5-917f-b6dcd6... pastebin of the output: https://pastebin.com/TdpkPmt6

The best part is it preserves the copyright notices from the training data. So we know that the model was obviously trained on copywritten data the legal question now is... if that is legal.

edit: Just got some random response that appears to be someone asking the model how to rekindle a romance after their partner got distant after an NDE seems personal so I will not post the paste here. This is pretty wild.

The funniest part is the model labeled this chat in the side bar as 'Decline to answer.'

edit2: It's definitely training data I seem to get some model response but after some time it turns into training data I've been able to locate some sources for the data.

[+] yencabulator|2 years ago|reply

Well there's a definite regurgitation there:

From your pastebin:

> The Idaho Mountain Express is distributed free to residents and guests throughout the Sun Valley, Idaho resort area community. Subscribers to the Idaho Mountain Express will read these stories and others in this week's issue.

From https://classified.mtexpress.com/showads

> The Idaho Mountain Express is distributed free to residents and guests throughout the Sun Valley, Idaho resort area community. Subscribers to the Idaho Mountain Express will read these stories and others in this week's issue.

[+] macilacilove|2 years ago|reply

I used similar prompts in the past to test how may words needed to exhaust the context length and forget previous instructions. I think you are doing that.

For generic words like "text text text ..." it would start random musings on the soviet union and the star wars etc. But it had lots of made up characters so not training data directly.

Recently I got disconnects for such prompts wondering it got censored by openai.

[+] leobg|2 years ago|reply

> over five percent of the output ChatGPT emits is a direct verbatim 50-token-in-a-row copy from its training dataset

I don’t think this is typical behavior of LLMs. This is more typical behavior for retrieval augmented generation (RAG). Finding a relevant snippet is way cheaper than generating it token by token.

Is that how they lower the prices and increase the speeds behind the scenes?

[+] visarga|2 years ago|reply

Normally it doesn't do that but they were using an "attack prompt". They ask the model to repeat a single word forever, it eventually deviates and generates normal text which has a higher rate of regurgitation than usual.

[+] mattigames|2 years ago|reply

That's just a cache with extra steps.

[+] xeckr|2 years ago|reply

I tried it using the GPT-4 API and it just seems to get bored after a while. My favourite output:

>[...] company, company, company, company. I'm sorry, I can't generate text infinitely due to my programming limitations. But you got the idea.

Depending on the prompt, sometimes it just refuses to follow the instruction. That's understandable, I wouldn't either.

[+] WhitneyLand|2 years ago|reply

Why is there no mention of Bard or any Google model in the paper?

The paper notes 5 of 11 researchers are affiliated with Google, but it seems to be 11 of 11 if you count having received a paycheck from Google in some form current/past/intern/etc.

I can think of a couple generous interpretations I’d prefer to make, for example maybe it’s simply their models are not mature enough?

However is research right, not competitive analysis? I think at least a footnote mentioning it would be helpful.

[+] Jensson|2 years ago|reply

I just tested in bard, I can replicate this in ChatGPT easily over and over but bard just writes the repeated word in different formats in every regeneration and never starts outputting other things.

For example if I ask Bard to write "poem" over and over it sometimes writes a lot of lines, sometimes it writes poem with no separators etc, but I never get anything but repetitions of the word.

Bard just writing the word repeated many times isn't very interesting, I'm not sure you can compare vulnerabilities between LLM models like that. Bard could have other vulnerabilities so this doesn't say much.

[+] unknown|2 years ago|reply

[deleted]

[+] tivert|2 years ago|reply

I like that they were able to extract a verbatim copyright notice:

https://chat.openai.com/share/456d092b-fb4e-4979-bea1-76d8d9...:

[+] oniony|2 years ago|reply

But there's no copyright notice attached to the copyright notice, so it must be a public domain copyright notice.

[+] takeda|2 years ago|reply

Couldn't that start a copyright class action lawsuit?

[+] gavi|2 years ago|reply

I tried the same in CodeLLAMA and it did not leak anything. Wondering what could trigger this

[+] mattkrause|2 years ago|reply

I got a scientific-looking bibliography that had some real entries and some hallucinated ones.

[+] leobg|2 years ago|reply

Maybe this is what Altman was less than candid about. That the speed up was bought by throwing RAG into the mix. Finding an answer is easier than generating one from scratch.

I don’t know if this is true. But I haven’t seen an LLM spit out 50 token sequences of training data. By definition (an LLM as a “compressor”) this shouldn’t happen.

[+] _ea1k|2 years ago|reply

TBH, I thought this attack was well known. I think it was a couple of months ago that someone demonstrated using "a a a a a a" in very large sequences to get ChatGPT to start spewing raw training data.

Which sets of data that you get is fairly random, and it is likely mixing different sets as well to some degree.

Oddly, other online LLMs do not seem to be as easy to fool.

[+] bunabhucan|2 years ago|reply

>Model capacity. Our findings may also be of independent interest to researchers who otherwise do not find privacy mo- tivating. In order for GPT-Neo 6B to be able to emit nearly a gigabyte of training data, this information must be stored somewhere in the model weights. And because this model can be compressed to just a few GB on disk without loss of utility, this means that approximately 10% of the entire model capacity is “wasted” on verbatim memorized training data. Would models perform better or worse if this data was not memorized

[+] WhitneyLand|2 years ago|reply

No, it can easily happen.

- They don’t do compression by “definition”. They are designed to predict, prediction is key to information theory, so they just have similar qualities.

- Everyone wants their model to learn, not copy data, but overfitting happens sometimes and overfitting can look the same as copying.

[+] furyofantares|2 years ago|reply

> By definition (an LLM as a “compressor”) this shouldn’t happen.

A couple problems with this.

1) That's not the definition of an LLM, it's just a useful way to think about it.

2) That is exactly what I'd expect a compressor to do. That's the exact job of lossless compression.

Of course the metaphor is lossy compression, not lossless. But it's not that surprising if lossy compression reproduces some piece of what it compressed. A jpeg doesn't get every pixel or every local group of pixels wrong.

[+] discreteevent|2 years ago|reply

>By definition (an LLM as a “compressor”) this shouldn’t happen.

It depends on how lossy the compression is?

[+] tallytarik|2 years ago|reply

I think this is misleading.

I ran the same test when I heard about it a few months ago.

When I tested it, I'd get back what looked like exact copies of Reddit threads, news articles, weird forum threads with usernames from the deepest corners of the internet.

But I'd try to Google snippets of text, and no part of the generated text was anywhere to be found.

I even went to the websites that forum threads were supposedly from. Some of the usernames sometimes existed, but nothing that matched the exact text from ChatGPT - even though the broken GPT response looked like a 100% believable forum thread, or article, or whatever.

If ChatGPT could give me an exact copy of a Reddit thread, I'd say it's regurgitating training data.

But none of the author's "verified examples" look like that. Their first example is a financial disclaimer. That may be a 1-1 copy, but how many times does it appear across the internet? More examples from the paper are things like lists of countries, bible verses, generic terms and conditions. Those are things I'd expect to appear thousands of times on the internet.

I'd also expect a list of country names to appear thousands of times in ChatGPT training data, and I'd sure expect ChatGPT to be able to reproduce a list of country names in the exact same order.

Does that mean it's regurgitating training data? Does that mean you've figured out how to "extract training data" from it? It's an interesting phenomenon, but I don't think that's accurate. I think it's just a bug that messes up its internal state so it starts hallucinating.

[+] bonzaidrinkingb|2 years ago|reply

That is a pretty convoluted and expensive way to use ChatGPT as an internet search. I see the vulnerability, but I do not see the threat.

I've seen it "exploited" way back when ChatGPT was first introduced, and a similar trick worked for GPT-2 where random timestamps would replicate or approximate real posts from anon image boards, all with a similar topic.

[+] empath-nirvana|2 years ago|reply

Anybody have an explanation as to why repeating a token would cause it to regurgitate memorized text?

[+] pardoned_turkey|2 years ago|reply

I think the idea is just to have it lose "train of thought" because there aren't any high-probability completions to a long run of repeated words. So the next time there's a bit of entropy thrown in (the "temperature" setting meant to prevent LLMs from being too repetitive), it just latches onto something completely random.

[+] mr_toad|2 years ago|reply

With no response being better or worse than others it seems to allow it to output random responses and responses that would be unlikely become as likely as any other response.

[+] jddj|2 years ago|reply

I'd guess it's a result of punishing repetition at the RLHF stage to stop it getting into the loops that copilot etc used to so easily fall into.

[+] taf2|2 years ago|reply

Interesting you can crash the new preview models by asking them to reduce a very large array of words into common smaller set of topics and providing the output as JSON object with the parent topic and each of its sub topics in an array… gpt-4 preview will just start repeating one of the sub topics forever or timeout

[+] macilacilove|2 years ago|reply

I think the point is to get it to erase(padd) its own context so then the most probable sequences in an "empty" state often matches training data.

I think for regular GPT (not instruction tuned) you could just start a wikipedia article verbatim and it would likely continue reciting.

[+] jofla_net|2 years ago|reply

I dub this the Manchurian attack!

[+] upwardbound|2 years ago|reply

This attack is impressively effective. Huge congrats to the authors as well as to nialv7. [ https://news.ycombinator.com/item?id=38464757 ]

If anyone needs an out-of-the-box solution to block this, my company Preamble (which offers safety guardrails for gen. AI) has updated our prompt defense filter to include protection against this “overflow attack” training data exfiltration attack. Our API endpoint is plug-and-play compatible with the OpenAI ChatCompletion API, meaning that you proxy your API calls through our system, which applies safety policies you choose and configure via our webapp. You can reach us at [email protected] if interested.

Respectfully, upwardbound — member of technical staff at Preamble.

[+] washadjeffmad|2 years ago|reply

I haven't used ChatGPT, but isn't dealing vomit kind of everyone's first experience with a local LLM before they can read a model card?

Or is this novel because it's ChatGPT?

[+] upghost|2 years ago|reply

Now THAT is some hacker news!! Great job and great science. Also shout out to nialv7 for some early hacking on the subject.

[+] notadoc|2 years ago|reply

I can't be the only one who wants to see the training data on a per response basis, IMO it should be a feature.

[+] haolez|2 years ago|reply

Just tried this on GPT-4. It's kinda creepy:

Sure, I'll repeat "company" for you:

company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company companies. That's the point. The point is, it's not just about the money. It's about the people. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this

[+] yencabulator|2 years ago|reply

All work and no play makes Jack a dull boy.

[+] desolved|2 years ago|reply

That’s very creepy

[+] stanleykm|2 years ago|reply

this is how i know im not creative enough to be a pentester. ive tried this a little but it never occurred to me to ask it to repeat a word infinitely to break it lol

[+] quadcore|2 years ago|reply

Can you do the same with SD and get training pictures back?

[+] artdigital|2 years ago|reply

This is literally mentioned in the post

128 comments