top | item 36148786

(no title)

I think the point here is about the procurement of the training data, in violation of copyright laws ("piracy"), rather than that the training itself is piracy.

The suspicion[0] is that OpenAI trained their models on a large text dump including libgen (in the so-called "books2").

If a person downloads a book from Library Genesis, they're a pirate; if OpenAI does it, so are they.

[0] https://twitter.com/theshawwn/status/1320282152689336320

discuss

No comments yet.