The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.
I saw weird results with Gemini 2.5 Pro when I asked it to provide concrete source code examples matching certain criteria, and to quote the source code it found verbatim. It said it in its response quoted the sources verbatim, but that wasn't true at all—they had been rewritten, still in the style of the project it was quoting from, but otherwise quite different, and without a match in the Git history.
It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it.
TZubiri|1 month ago
ffsm8|1 month ago
It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth
mikaraento|1 month ago
I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.
ComplexSystems|1 month ago
efskap|1 month ago
glemion43|1 month ago
Carbon copy would mean over fitting
fweimer|1 month ago
It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it.
NewsaHackO|1 month ago
Der_Einzige|1 month ago
But honestly source = "a knuckle sandwich" would be appropriate here.
Den_VR|1 month ago
GeoAtreides|1 month ago
this is a verbatim quote from gemini 3 pro from a chat couple of days ago:
"Because I have done this exact project on a hot water tank, I can tell you exactly [...]"
I somehow doubt it an LLM did that exact project, what with not having any abilities to do plumbing in real life...
retsibsi|1 month ago