The infancy period of this technology is fascinating.
Think about computer graphics 15 years ago. Beowulf came out in 2007, and was developed in the preceding years- let's call it 15+ years old. And it was right there in the uncanny valley where it didn't look real, but it looks realistic. It was interesting visually, but my brain told me "this isn't correct".
And now some modern game engines are doing more realistic rendering than that in real-time.
Now look at these generative models. Some state of the art ones with humans helping are pretty convincing, but it's slow work. The more general ones like these are making these wonderfully interesting images that our brains immediately say "That's not correct".
But where will this technology be in another 15 years? I think the possibilities for entertainment are really interesting. Imagine a D&D game where the GM is vocally telling the AI what to generate, then making small tweaks, and the players are seeing the results.
Just yesterday I was asked if I was worried about losing my job because of AI and I smugly replied that us programmers will be needed even more, as interpreters. This is an excellent article as explanation I'll be sharing!
VQGAN+CLIP seem to have this dream-like quality where they generate images that are evocative of your prompts but don't actually picture them.
I find it fascinating because in some cases it's not as obvious as "lump of white fluffy matter" = "sheep" but it still manages to evoke the prompt into our brains.
I'll sometimes get an unrecognizable blob but quickly asking my SO "what is this?" she will get it... unless she consciously looks at it!
>quickly asking my SO "what is this?" she will get it... unless she consciously looks at it!
It does make you wonder about hypothetical artificial neural network-like "subconscious" layers and how "more conscious" prefrontal cortex layers potentially adjust predictions and perceptions based on their inputs. (Probably a convenient "just-so" "clockwork universe"-esque narrative unsupported by neuroscience research, though.)
It's definitely "injective but not bijective" or something like that.
Like I look at the prompt "sheep grazing on a hillside by tim burton". I look at the pic. Brain goes, "yup, that checks out". You wouldn't necessarily derive the domain from the range (preimage attack), but I can readily say, "if I fell asleep after watching Wallace and Gromit - Close Shave, and Nightmare Before Christmas, this is what I would dream".
The underlying problem these elaborate prompts seem to solve is that the internet contains many pictures, few of which look very beautiful.
If you look at all internet pictures of sheep, many of them will not be very exciting and depict a low contrast sheep in a foggy landscape.
So to get a picture with strong saturation and clear lines, it helps to put text there that is usually associated with pictures that have these ... like "HD wallpaper" or "made with unreal engine". Most "wallpapers" might be of dubious artistic quality, but muted colors and a lack of saturation will generally not be their problem.
This is of course not the only problem with the model. It doesn't even produce a clear image of a sheep .... but that will probably get better with larger models and more training. Similarly it doesn't seem to have a sense of overall composition and tends towards fractal or tiling-like images. But those problems are probably orthogonal to the fact that the model doesn't per se try to make good pictures ... just average ones for the description you give it.
I played around with these notebooks a while back, and wondered what you get if you jointly optimize for several different prompts. Has anyone tried this? (Or is this what the article is about?)
Philosophic words have been a part of programmers life sometimes but not misleading. Go in depth and you would get the meaning. I must say it is cleaver use of words.
[+] [-] yosito|4 years ago|reply
[+] [-] himoacs|4 years ago|reply
[+] [-] qwertox|4 years ago|reply
[+] [-] spinningarrow|4 years ago|reply
[+] [-] lifthrasiir|4 years ago|reply
[1] https://publicsuffix.org/
[+] [-] mimsee|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] mabbo|4 years ago|reply
Think about computer graphics 15 years ago. Beowulf came out in 2007, and was developed in the preceding years- let's call it 15+ years old. And it was right there in the uncanny valley where it didn't look real, but it looks realistic. It was interesting visually, but my brain told me "this isn't correct".
And now some modern game engines are doing more realistic rendering than that in real-time.
Now look at these generative models. Some state of the art ones with humans helping are pretty convincing, but it's slow work. The more general ones like these are making these wonderfully interesting images that our brains immediately say "That's not correct".
But where will this technology be in another 15 years? I think the possibilities for entertainment are really interesting. Imagine a D&D game where the GM is vocally telling the AI what to generate, then making small tweaks, and the players are seeing the results.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] voiper1|4 years ago|reply
[+] [-] sooheon|4 years ago|reply
[+] [-] napier|4 years ago|reply
As an aside, are there any good approaches for producing this kind of generative art on a CPU only system that lacks a GPU?
[+] [-] telesilla|4 years ago|reply
[+] [-] amelius|4 years ago|reply
Just how programming was supposed to be the "new literacy"?
[+] [-] atupis|4 years ago|reply
[+] [-] iamflimflam1|4 years ago|reply
[+] [-] splittingTimes|4 years ago|reply
[+] [-] kaoD|4 years ago|reply
I find it fascinating because in some cases it's not as obvious as "lump of white fluffy matter" = "sheep" but it still manages to evoke the prompt into our brains.
I'll sometimes get an unrecognizable blob but quickly asking my SO "what is this?" she will get it... unless she consciously looks at it!
Fascinating.
[+] [-] meowface|4 years ago|reply
It does make you wonder about hypothetical artificial neural network-like "subconscious" layers and how "more conscious" prefrontal cortex layers potentially adjust predictions and perceptions based on their inputs. (Probably a convenient "just-so" "clockwork universe"-esque narrative unsupported by neuroscience research, though.)
[+] [-] bentcorner|4 years ago|reply
https://www.youtube.com/watch?v=udPY5rQVoW0
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] kortex|4 years ago|reply
Like I look at the prompt "sheep grazing on a hillside by tim burton". I look at the pic. Brain goes, "yup, that checks out". You wouldn't necessarily derive the domain from the range (preimage attack), but I can readily say, "if I fell asleep after watching Wallace and Gromit - Close Shave, and Nightmare Before Christmas, this is what I would dream".
[+] [-] 0xcoffee|4 years ago|reply
[+] [-] mineshkumar|4 years ago|reply
[+] [-] satori99|4 years ago|reply
It is oddly addictive.
https://photos.app.goo.gl/t41uLs3Wogmrgn887
[+] [-] fho|4 years ago|reply
[+] [-] oceanofsolaris|4 years ago|reply
If you look at all internet pictures of sheep, many of them will not be very exciting and depict a low contrast sheep in a foggy landscape.
So to get a picture with strong saturation and clear lines, it helps to put text there that is usually associated with pictures that have these ... like "HD wallpaper" or "made with unreal engine". Most "wallpapers" might be of dubious artistic quality, but muted colors and a lack of saturation will generally not be their problem.
This is of course not the only problem with the model. It doesn't even produce a clear image of a sheep .... but that will probably get better with larger models and more training. Similarly it doesn't seem to have a sense of overall composition and tends towards fractal or tiling-like images. But those problems are probably orthogonal to the fact that the model doesn't per se try to make good pictures ... just average ones for the description you give it.
[+] [-] etaioinshrdlu|4 years ago|reply
[+] [-] yboris|4 years ago|reply
https://www.youtube.com/watch?v=Jbn1aJuarIU
[+] [-] masswerk|4 years ago|reply
[filed under: "ultra cool comment trending as a meme on reddit" ;-) ]
[+] [-] 1MachineElf|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] ourcat|4 years ago|reply
[+] [-] menzoic|4 years ago|reply
[+] [-] napier|4 years ago|reply
[+] [-] prasenjit_pro|4 years ago|reply
[+] [-] seedless-sensat|4 years ago|reply
[+] [-] Delowar776|4 years ago|reply
[+] [-] nazrulmum10|4 years ago|reply
[+] [-] KingOfCoders|4 years ago|reply
[+] [-] kevin_young|4 years ago|reply
[deleted]