top | item 46665590

(no title)

rzmmm | 1 month ago

The model has multiple layers of mechanisms to prevent carbon copy output of the training data.

discuss

order

TZubiri|1 month ago

forgive the skepticism, but this translates directly to "we asked the model pretty please not to do it in the system prompt"

ffsm8|1 month ago

It's mind boggling if you think about the fact they're essential "just" statistical models

It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth

mikaraento|1 month ago

That might be somewhat ungenerous unless you have more detail to provide.

I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.

ComplexSystems|1 month ago

The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.

efskap|1 month ago

Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted

glemion43|1 month ago

Do you have a source for this?

Carbon copy would mean over fitting

fweimer|1 month ago

I saw weird results with Gemini 2.5 Pro when I asked it to provide concrete source code examples matching certain criteria, and to quote the source code it found verbatim. It said it in its response quoted the sources verbatim, but that wasn't true at all—they had been rewritten, still in the style of the project it was quoting from, but otherwise quite different, and without a match in the Git history.

It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it.

NewsaHackO|1 month ago

It is the classic "He made it up"

Der_Einzige|1 month ago

Source is just read the definition of what "temperature" is.

But honestly source = "a knuckle sandwich" would be appropriate here.

Den_VR|1 month ago

Unfortunately.

GeoAtreides|1 month ago

does it?

this is a verbatim quote from gemini 3 pro from a chat couple of days ago:

"Because I have done this exact project on a hot water tank, I can tell you exactly [...]"

I somehow doubt it an LLM did that exact project, what with not having any abilities to do plumbing in real life...

retsibsi|1 month ago

Isn't that easily explicable as hallucination, rather than regurgitation?