top | item 42566378

(no title)

rst | 1 year ago

OpenAI is extremely cagey about what's in their test data set generally, but absent more specific info, they're widely assumed to be grabbing whatever they can. (Notably including copyrighted information used without explicit authorization -- I'll take no position on legal issues in the New York Times's lawsuit against OpenAI, but at the very least, getting their models to regurgitate NYT articles verbatim demonstrates pretty clearly that those articles are in the training set.)

discuss

No comments yet.