(no title)
porkloin | 7 days ago
I didn't read the source paper referenced in the ars technica piece, but this statement about it makes me wonder how useful it actually is:
> But a study published last month showed that researchers at Stanford and Yale Universities were able to strategically prompt LLMs from OpenAI, Google, Anthropic, and xAI to generate thousands of words from 13 books, including A Game of Thrones, The Hunger Games, and The Hobbit.
It seems like well-known books with tons of summary, adaptations into film scripts, and tons of writing about the book in the overall corpus make it way less surprising to see be partially reproducible.
So I guess that's a lot of words to say - yeah until there's something definitive that allows people to prompt LLMs into either unlawfully recreating an entire work verbatim or otherwise indisputably proving that a copyrighted work was used in training data, there's probably nothing game changing in it.
vidarh|7 days ago
I suspect very works will be memorised enough to be an issue, and we'll see the providers tighten up their guardrails a bit for works that are well known enough to actually be a potential issue (issue in the form of lawsuits, not in the form of real damages to the copyright holders)