(no title)
joenot443 | 23 days ago
This definitely raises an interesting question. It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files. Immediately to mind was House of Leaves, Infinite Jest, Harry Potter, basically any Stephen King book - they've all been posted at some point.
Do LLMS have a good way of inferring where knowledge from the context begins and knowledge from the training data ends?
rendx|23 days ago
Anna's Archive alone claims to currently publicly host 61,654,285 books, more than 1PB in total.