top | item 44181329

(no title)

underlines | 9 months ago

yes, every major llm company did it:

illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.

discuss

order

No comments yet.