(no title)
StrangeATractor | 2 years ago
I wonder if opening GPT and DALLE to the public was partly intended to pollute subsequent data for anyone that gets into AI down the road. Suddenly a lot of publicly accessible data is worth less, leaving only players who've got a hoard of time-stamped data to compete with (like Google, Facebook). OpenAI almost certainly has the hashes of what it spits out too, so they'll be able to sort the wheat from the chaff for a while yet.
The market for data may be getting interesting.
sebzim4500|2 years ago
Normal hashes are extremely fragile, so they'd have to use something more sophisticated. Scott Aaronson said in a podcast a few months ago that OpenAI has implemented such a system but at the time they had not decided to start using it.
The purpose being discussed at the time was to provide a tool for educators to detect cheating, but presumably it could also be used for filtering future datasets.
brucethemoose2|2 years ago