top | item 36322128

(no title)

StrangeATractor | 2 years ago

Hah, I brought this up here a few months ago and was quickly dismissed.

I wonder if opening GPT and DALLE to the public was partly intended to pollute subsequent data for anyone that gets into AI down the road. Suddenly a lot of publicly accessible data is worth less, leaving only players who've got a hoard of time-stamped data to compete with (like Google, Facebook). OpenAI almost certainly has the hashes of what it spits out too, so they'll be able to sort the wheat from the chaff for a while yet.

The market for data may be getting interesting.

discuss

order

sebzim4500|2 years ago

>OpenAI almost certainly has the hashes of what it spits out too, so they'll be able to sort the wheat from the chaff for a while yet.

Normal hashes are extremely fragile, so they'd have to use something more sophisticated. Scott Aaronson said in a podcast a few months ago that OpenAI has implemented such a system but at the time they had not decided to start using it.

The purpose being discussed at the time was to provide a tool for educators to detect cheating, but presumably it could also be used for filtering future datasets.

brucethemoose2|2 years ago

Its older than that: I ran into this finetuning ESRGAN on itself. Distortion is rapidly amplified in sucessive generations, even when you pixel peep and can barely see it in the esrgan generated source.