(no title)
fuzzbazz | 8 months ago
Could it be plausible that an LLM had ingested parts of the book via scrapping web pages like this and not the full copyrighted book and get results similar to those of the linked study?
[1] https://www.goodreads.com/work/quotes/4640799-harry-potter-a...
[2] ~30 portions x 68 pages
paxys|8 months ago
https://www.wired.com/story/new-documents-unredacted-meta-co...
aprilthird2021|8 months ago
aspenmayer|8 months ago
https://www.reddit.com/r/DataHoarder/comments/1entowq/i_made...
https://github.com/shloop/google-book-scraper
The fact that Meta torrented Books3 and other datasets seems to be by self-admission by Meta employees who performed the work and/or oversaw those who themselves did the work, so that is not really under dispute or ambiguous.
https://torrentfreak.com/meta-admits-use-of-pirated-book-dat...
redox99|8 months ago