top | item 31487326

(no title)

qz_kb | 3 years ago

I have to wonder how much releasing these models will "poison the well" and fill the internet with AI generated images that make training an improved model difficult. After all if every 9/10 "oil painted" image online starts being from these generative models it'll become increasingly difficult to scrape the web and to learn from real world data in a variety of domains. Essentially once these things are widely available the internet will become harder to scrape for good data and models will start training on their own output. The internet will also probably get worse for humans since search results will be completely polluted with these "sort of realistic" images which can ultimately be spit out at breakneck speed by smashing words from a dictionary together...

discuss

rhacker|3 years ago

Look at carpentry blogs, recipe blogs. Nearly all of it is junk content. I bet if you combined GPT and imagen or dalle2 you could replace all of them. Just provide a betty crocker recipe and let it generate a blog that has weekly updates and even a bunch of images - "happy family enjoying pancakes together"

I can see the future as being devoid of any humanity.

slimsag|3 years ago

I wrote a comedic "Best Apache Chef recipe" article[1] mocking these sites.

I guess the concern would be: If one of these recipe websites _was_ generated by an AI, the ingredients _look_ correct to an AI but are otherwise wrong - then what do you do? Baking soda swapped with baking powder. Tablespoons instead of teaspoons. Add 2tbsp of flower to the caramel macchiato. Whoops! Meant sugar.

[0] http://slimsag.com/best-apache-chef-recipe/1438731.htm

thelittleone|3 years ago

Seeing this a lot on youtube also. Scripts pulling in "news" from a source as a script for a robo voice combined with "related" images stitched together randomly.

exikyut|3 years ago

"Picture of happy nuclear family enjoying paperclip maximization at the beach"

animal_spirits|3 years ago

The future digital landscape might be void of humanity, but there will still be real humans living next door to you ;)

rmbyrro|3 years ago

I see the opposite future.

As AI advances, a lot of people will look after experiencing life outside the digital world.

Even digital communication will not be trustworthy anymore with deepfaces and everything else, so people will want to get together more often.

Edit: for the lazy ones, yeah, digital will be a sad and heartless environment...

4m1rk|3 years ago

Doesn't it increases the value of genuine human-produced content? Or their NFTs!

kimi|3 years ago

> I can see the future as being devoid of any humanity.

Considering how many of the readers of said blog will be scrapers and bots, who will use the results to generate more spammy "content", I think you are right.

joshspankit|3 years ago

I’d much rather skip the blog format and replace them with an AI that can answer “Please provide a pie recipe like my grandparent’s”, or “I’d like to make these ribs on the BBQ so that they come out flavourful, soft, and a little sweet.”

walt74|3 years ago

>I can see the future as being devoid of any humanity.

I can see a past where this already happened, to paraphrase Douglas Adams ;)

rg111|3 years ago

People training newer models just have to look for the "Imagen" tag or the Dall-E2 rainbow at the corner and heuristically exclude images having these. This is trivial.

Unless you assume there are bad actors who will crop out the tags. Not many people now have access to Dall-E2 or will have access to Imagen.

As someone working in Vision, I am also thinking about whether to include such images deliberately. Using image augmentation techniques is ubiquitous in the field. Thus we introduce many examples for training the model that are not in the distribution over input images. They improve model generality by huge margins. Whether generated images improve generality of future models is a thing to try.

Damn I just got an idea for a paper writing this comment.

SirHound|3 years ago

Most images you see from these services will not have a watermark on them. Cropping is trivial.

viraptor|3 years ago

> Unless you assume there are bad actors who will crop out the tags.

I don't know why people do that but lots of randoms on the internet do that and they're not even bad actors per se. The removed signatures from art posted online became a kind of a meme itself. Especially when comic strips are reposted on Reddit. So yeah, we'll see lots of them.

zone411|3 years ago

In my melody generation system I'm already including melodies that I've judged as "good" (https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y...) in the updated training set. Since the number of catchy melodies that have been created by humans is much, much lower than the number of pretty images, it makes a significant difference. But I'd expect that including AI-generated images without human quality judgement scores in the training set won't be any better than other augmentation techniques.

JayStavis|3 years ago

Huh, I had never thought of that. Makes it seem like there's a small window of authenticity closing.

The irony is that if you had a great discriminator to separate the wheat from the chaff, that it would probably make its way into the next model and would no longer be useful.

My only recommendation is that OpenAI et al should be tagging metadata for all generated images as synthetic. That would be a really interesting tag for media file formats (would be much better native than metadata though) and probably useful across a lot of domains.

joshspankit|3 years ago

The OpenAI access agreement actually says that you must add (or keep?) a watermark on any generated images, so you’re in good company with that line of thinking.

agar|3 years ago

The irony is that when the majority of content becomes computer-generated, most of that content will also be computer-consumed.

Neil Stephenson covered this briefly in "Fall; or Dodge In Hell." So much 'net content was garbage, AI-generated, and/or spam that it could only be consumed via "editors" (either AI or AI+human, depending on your income level) that separated the interesting sliver of content from...everything else.

jillesvangurp|3 years ago

He was definitely onto something in that book where people also resort to using blockchains to fingerprint their behavior and build an unbreakable chain of authenticity. Later in that book that is used to authorize the hardware access of the deceased and uploaded individuals.

A bit far out there in terms of plot but the notion of authenticating based on a multitude of factors and fingerprints is not that strange. We've already started doing that. It's just that we currently still consume a lot of unsigned content from all sorts of unreliable/untrustworthy sources.

Fake news stops being a thing as soon as you stop doing that. Having people sign off on and vouch for content needs to start becoming a thing. I might see Joe Biden saying stuff in a video on Youtube. But how do I know if that's real or not?

With deep fakes already happening, that's no longer an academic question. The answer is that you can't know. Unless people sign the content. Like Joe Biden, any journalists involved, etc. You might still not know 100% it is real but you can know whether relevant people signed off on it or not and then simply ignore any unsigned content from non reputable sources. Reputations are something we can track using signatures, blockchains, and other solutions.

Interesting with Neal Stephenson that he presents a problem and a possible solution in that book.

afro88|3 years ago

I can see a world where in person consumption of creative media (art, music, movies etc), where all devices are to be left at the door, becomes more and more sought after and lucrative.

If the AI models can't consume it, it can't be commoditised and, well, ruined.

whatshisface|3 years ago

I don't think it will "poison the well" so much as change it - images that humans like more will get a higher pagerank, so the models trained on Google Images will not so much as degrade as they will detach from reality and begin to follow the human mind they way plausible fiction does.

joshspankit|3 years ago

Just yesterday I was speculating that current AI is bad at math because math on the internet is spectacularly terrible.

I think you’re right, and it’s unlikely that we (society) will convince people to label their AI content as such so that scraping is still feasible.

It’s far more likely that companies will be formed to provide “pristine training sets of human-created content”, and quite likely they will be subscription based.

trhway|3 years ago

>“pristine training sets of human-created content”

well, we do have organic/farmed/handcrafted/etc. food. One can imagine information nutrition label - "contains 70% AI generated content, triggers 25% of the daily dopamine release target".

kleer001|3 years ago

How would that really happen? It seems to me you're assuming that there's no such thing as extant databases of actual oil paintings, that people will stop producing, documenting, and curating said paintings. I think the internet and curated image databases are far more well kept than your proposed model accounts for.

qz_kb|3 years ago

My hypothetical example is not really about oil paintings, but the fact these models will surely get deployed and used for stock photos for articles, on art pages etc.

I think this will introduce unavoidable background noise that will be super hard to fully eliminate in future large scale data sets scraped from the web, there's always going to be more and more photorealistic pictures of "cats" "chairs" etc. in the data that are close to looking real but not quite, and we can never really go back to a world where there's only "real" pictures, or "authentic human art" on the internet.

abel_|3 years ago

On the contrary -- the opposite will happen. There's a decent body of research showing that just by training foundation models on their outputs, you amplify their capabilities.

Less common opinion: this is also how you end up with models that understand the concept of themselves, which has high economic value.

Even less common opinion: that's really dangerous.

rajnathani|3 years ago

For better training data in the future: Storing a content hash and author identification (an example proprietary solution right now [0]) of image authors, and having a decentralized reputation system for people/authors would help be the solution for better training data in the future whereby authors can gain reputation/incentives too.

[0] https://creativecloud.adobe.com/discover/article/how-to-use-...

gwern|3 years ago

I don't think it will be a big deal, for multiple different reasons: https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-...

dclowd9901|3 years ago

Eventually the only jobs humans will have is training AI to act human. Sounds very Philip K Dick now that I think about it.

actionfromafar|3 years ago

The transition will be complete when some AI can fool/bribe the other AIs that its workers are human.

kulikalov|3 years ago

Good looking images will be popular, bad looking images will be disposed on the backyard of the internet. Even if next iterations of these models will be trained on AI-generated images, the dataset will be well filtered by how much people like those images. After all, that's the purpose of any art, right?

LoveMortuus|3 years ago

Maybe we'll go back to index based search engines like Yahoo. Could resolve many issues we see today, but I think the biggest question is scalability. Maybe some open source open database system?

benlivengood|3 years ago

I think instead the images people want to put on the Internet will do the same for these models as adversarial training did for AlphaZero; it will learn what kinds of images engage human reaction.

VMG|3 years ago

It will not be limited to the internet. Have you looked at a magazine stand in the last 10 years? The content looks generated (not by AI) even today.

Cheap books, cheap TV and cheap music will be generated.

bowmessage|3 years ago

I also worry about the potential to further stifle human creativity, e.g. why paint that oil painting of a panda riding a bicycle when I could generate one in seconds?

telesilla|3 years ago

Our imaginations are gigantic. We'll find something else impressive and engaging to do. Or not care. I'm not worried. Watch children: they find a way to play even when there is nothing.

richrichardsson|3 years ago

One reason:

A digital picture of an oil painting != an actual oil painting

Of course once someone trains an AI with a robotic arm to do the actual painting, then your worry holds firm.

Gigachad|3 years ago

I wonder if google images could just seed in some generated images when none relevant are found..

unknown|3 years ago

[deleted]

ismepornnahi|3 years ago

Adding a watermark to all AI generated images should be imperative.