top | item 32650432

1 week of Stable Diffusion

457 points| victormustar | 3 years ago |multimodal.art | reply

166 comments

order
[+] fab1an|3 years ago|reply
I think most are vastly underestimating the impact of Synthetic AI media - this is at least as big as the invention of film/photography (century-level shift) and maybe as big as the invention of writing (millenia-level shift). Once you really think through the consequences of the collapse of idea and execution, you likely to tend think the latter...

What we're seeing now are toy-era seeds for what's possible - e.g. I've been making a completely Midjourney-generated "interactive" film called SALT: https://twitter.com/SALT_VERSE/status/1536799731774537733

That would have been completely impossible just a few months ago. Incredibly exciting to think what we'll be able to do just one year from now..

[+] nerdponx|3 years ago|reply
Is it? I seriously doubt it.

Other than "you can't trust anything you don't see with your own eyes", what kind of shift is it? People lived like that for literally millennia before photography, audio, and video recording.

At absolute worst, we are only undoing about 150 years of development, and only "kind of" and only in certain scenarios.

Moreover, people were making convincing edits of street signs, etc. literally 20 years ago using just Photoshop. What does this really change at a fundamental level? Okay, so you can mimic voices and generate video, rather than just static images. But people have been making hoax recordings and videos for longer than we've had computers.

I think the effects of this stuff will be: 1) making it easier/cheaper to create certain forms of art and entertainment media, 2) making it easier to create hoaxes, and 3) we will eventually need to contend with challenges to IP law. That's about it. I think it will create a lot of value for a lot of people (sibling comment makes a good point about this being equivalent to CGI), but I don't see the big societal shift you're claiming that this is.

[+] seydor|3 years ago|reply
As big as the invention of CGI

Still, humans use art to communicate intent, and we still consider AIs to be 'things' , no agency or intent. Being an artist just became a lot harder, because no amount of technical prowess can make you stand out. It s all about the narrative now

[+] time_to_smile|3 years ago|reply
> toy-era seeds

I think what we have is a toy and will remain a toy, just like Eliza was 60 years ago. Academically fascinating, and given the constraints of the era, genuinely remarkable, but still a long way from really being useful.

I'm already getting bored of seeing 95% amazing 5% wtf AI generated images, I can't fathom how anyone else remains excited about this stuff so long. My slack is filled with impressive-but-not-quite-right images of all sorts of outrageous scenarios.

But that's the catch. These diffusion models are stuck creating wacky or surreal images because those contexts are essential allowing you to easily ignore how much these generates miss the mark.

Synthetic AI media won't even been as disruptive as photoshop, let alone the creation of written language.

[+] anonAndOn|3 years ago|reply
This thought occurred to me recently while skimming the formulaic and indistinguishable programming on Netflix. It won't be long before a GPT-3 script is fed to an image generator and out comes the components of a movie or TV show. The product will undoubtedly need some human curation and voice acting, but the possibility of a one-person production studio is on the horizon.
[+] paisawalla|3 years ago|reply
Agreed, it really does not seem far off now to imagine a world where I can request artifacts like

"This episode of Law & Order, but if Jerry Orbach never left the show"

"Final Fantasy VII as an FPS taking place in the Call of Duty universe"

"A 3D printable part that will enable automatic firing mode for {a given firearm}"

[+] deviner|3 years ago|reply
It doesn't bring anything new, just enhanced on top of what already exists, not even close to photography or film.
[+] Melatonic|3 years ago|reply
Doubt it - but it will become another great tool for artists to use.
[+] syntaxing|3 years ago|reply
It’s really crazy how Stable Diffusion seems to be very on par with DALL-E and you can run it on “most” hardware. Is there an equivalent for GPT-3? I don’t even think I can run the 2M lite GPT-J on my computer…
[+] ManuelKiessling|3 years ago|reply
Tangential: I've set up a Discord Bot that turns your text prompt into images using Stable Diffusion.

You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...

Talk to it using the /draw Slash Command.

It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.

Oh and get your prompt ideas from https://lexica.art if you want good results.

PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?

[+] folli|3 years ago|reply
Funny of Reddit banning the mentioned Subs in a short amount of time.

Some years ago, the pendulum was very much on the other side.

[+] gillesjacobs|3 years ago|reply
7 days and already that many UIs, plugins and integrations released. To be fair, developer/researcher access was a bit earlier but that is impressive adoption speed.
[+] danso|3 years ago|reply
Tangent discussion: What are people's here experiences with running Stable Diffusion locally? I've installed it and haven't had time to play around, but I also have a RTX 3060 8GB GPU -- IIRC, the official SD docs say that 10GB is the minimum, but I've seen posts/articles saying it could be done with 8GB.

Mostly I'm interested in the processing time. Like, using a midrange desktop, what's the average time to expect SD to produce an image from a prompt? Minutes/Tens of minutes/Hours?

[+] cube2222|3 years ago|reply
RTX 3080 (10GB) here

Keep in mind to have the batch-size low (equal to 1, probably), that was my main issue when I first installed this.

Then, there's lot's of great forks already which add an interactive repl or web ui [0][1]. They also run with half-precision which saves a few bytes. Additionally, they optionally integrate with upscaling neural networks, which means you can generate 512x512 images with stable diffusion and then scale them up to 1024x1024 easily. Moreover, they optionally integrate with face-fixing neural networks, which can also drastically improve the quality of images.

There's also this ultra-optimized repo, but it's a fair bit slower [2].

[0]: https://github.com/lstein/stable-diffusion

[1]: https://github.com/hlky/stable-diffusion

[2]: https://github.com/basujindal/stable-diffusion

[+] chrismorgan|3 years ago|reply
ASUS Zephyrus G15 (GA503QM) with a laptop 3060 (95W, I think) with 6GB of VRAM, basujindal fork, does 512×512 at about 3.98 iterations per second in turbo mode (for which there’s plenty of memory at that size). That’s under 15 seconds per image on even small batches at the default 50 steps, and I think it was only using around 4.5GB of VRAM.

(I say “I think” because I’ve uninstalled the nvidia-dkms package again while I’m not using it because having a functional NVIDIA dual-GPU system in Linux is apparently too annoying: Alacritty takes a few seconds to start because it blocks on spinning up the dGPU for a bit for some reason even though it doesn’t use it, wake from sleep takes five or ten seconds instead of under one second, Firefox glyph and icon caches for individual windows occasionally (mostly on wake) get blatted (that’s actually mildly concerning, though so long as the memory corruption is only in GPU memory it’s probably OK), and if the nvidia modules are loaded at boot time Sway requires --unsupported-gpu and my backlight brightness keys break because the device changes in the /sys tree and I end up with an 0644 root:root brightness file instead of the usual 0664 root:video, and I can’t be bothered figuring it out or arranging a setuid wrapper or whatever. Yeah, now I’m remembering why I would have preferred a single-GPU laptop, to say nothing of the added expense of a major component that had gone completely unused until this week. But no one sells what I wanted without a dedicated GPU for some reason.)

[+] cbozeman|3 years ago|reply
Removing the NSFW and watermark modules from the model will easily allow you to run it with 8 GB VRAM (usually takes around 6.9 GB for 512x512 generations).

With an RTX 3060, your average image generation time is going to be around 7-11 seconds if I recall correctly. This swings wildly based on how you adjust different settings, but I doubt you'll ever require more than 70 seconds to generate an image.

[+] sarsway|3 years ago|reply
It's pretty fast on a RTX 3070 (8GB), a few seconds per image.

My first impression is it seems a lot more useful then DALL-E, because you can quickly iterate on prompts, and also generate many batches, picking the best ones. To get something that's actually usable, you'll have to tinker around a bit and give it a few tries. With DALL-E, feedback is slower, and there's reluctance to just hammer prompts because of credits.

[+] mlsu|3 years ago|reply
I have a dated 1070 with 8gb of vram, some of which also renders my desktop.

I was able to obtain 256x512 images with this card using the standard model, but ran into OOM issues.

I don't mind waiting, so now I am using the "fast" repo:

https://github.com/basujindal/stable-diffusion

With this, it takes 30s to generate a 768x512 image (any larger and I am experiencing OOM issues again). I think you should expect a bit faster at the same resolution with your 3060 because it's a faster card with the same amount of memory.

[+] Morgawr|3 years ago|reply
I have a Titan X (Pascal) from like 2015 with 12GB of vram and I've had no trouble running it locally. I'd say it takes me about 30 seconds maybe to generate a single image on a 30ddim (which is like the bare minimum I consider for quick iterations), when I want to get more quality images after I focus on a proper prompt, I set it to like 100 or 200 ddim and that maybe takes 1 minute for one picture (I didn't accurately measure). I usually just let it run for a few minutes in bulk of 10 or 20 pictures while I go do something else then come back half 15-20 minutes later.

It runs pretty well but the most I can get is a 768x512 image, but it's pretty good for stuff like visual novel background art[0] and similar things.

[0] - https://twitter.com/xMorgawr/status/1564271156462440448

[+] wccrawford|3 years ago|reply
I had to get a different repo with "optimized commands" on the first day, but my 3070 8GB has been happily processing images in decent time.
[+] motoboi|3 years ago|reply
Take a moment to appreciate the fact that in 4,2Gb (less than that actually) you have the English language somehow encoded.

This is mind blowing.

[+] stephc_int13|3 years ago|reply
AI generated art is interesting and will probably be helpful.

I see it as a cheap and fast alternative to paying a concept artist.

But not a revolution. Creating precise and coherent assets is going to be a challenge, at least with the current architecture.

From a research perspective this is, I think, much more than a toy, those models can help us better understand the nature of our minds, especially related to their processing of text, images and abstraction.

[+] Valakas_|3 years ago|reply
This is what a revolution looks like when it is happening.

Did you learn about the "Industrial revolution" or the "agricultural revolution" in class? That didn't take a week, or a year, or a decade to happen. Even the Internet revolution took more than a decade.

This is a revolution. And you're seeing it happen in real time.

[+] amelius|3 years ago|reply
I think what it shows us that activities that we think of as "human", like getting drunk, saying silly things that sound brilliant, or painting things that look stunning are actually the things that a machine has least trouble to copy.

Whereas things we associate more with computers, such as hard thinking, mathematics, etc. turn out to be more difficult to copy by a machine, and therefore perhaps more "human".

[+] cududa|3 years ago|reply
I've dismissed DALL-E - very cool, but won't really replace everyone. After playing with Stable Diffusion, as an artist, this is the most profound experience I've ever had with a computer. Check this out https://andys.page/posts/how-to-draw/
[+] m_ke|3 years ago|reply
After playing with it for a few hours I'm sold on it soon replacing all blog spam media and potentially flooding etsy with "artists" trying to pass the renders as their own art work.

Here's some of the stuff I generated: https://imgur.com/a/mfjHNgO

[+] andruby|3 years ago|reply
It must be interesting being a graphic artist in 2020-2022. First NFT's that enabled some to make millions of dollars. Less than 2 years later, Stable Diffusion, which will probably shrink the market significantly for human graphical artists.
[+] ebabchick|3 years ago|reply
can someone recommend a good paper or blog post with an overview of the technical architecture of training and running stable diffusion?
[+] ok_dad|3 years ago|reply
I guess we know where the new market for all those Ethereum miners' GPUs will come from. I have always been sort of bear-ish on the trend towards throwing GPU power at neural nets and their descendants, but clearly there are amazing applications for this tech. I still think it's morally kinda wrong to copy an artist's style using an automated tool like this, but I guess we'll have to deal with that because there's no putting this genie back in the bottle.
[+] poisonborz|3 years ago|reply
Just imagine - you could write your own script of a series and have it realistically generated, especially cartoons, complete with voice acting. Popular generated Spongebob episodes could form canonical entries in the mind of the general public - after some information fallout, original episodes couldn't be even told apart. Postmodern pastiche will accelerate and will become total.
[+] revskill|3 years ago|reply
7 days trying to install python and their packages and failed. Have to remove those garbages , global dependencies from my machine. Such a waste of ecosystem.
[+] gigel82|3 years ago|reply
I'm using the Docker one, so much easier and no worries of polluting my real environment (all the installation scripts tend to download a variety of things from a variety of places).
[+] digitallyfree|3 years ago|reply
Openvivo Stable Diffusion (CPU port of SD) is a easy install on Linux within a venv. Be sure to update Pip first before installing the required packages from the list. The lack of GPU acceleration and the associated baggage makes this much easier to set up and run.

https://github.com/bes-dev/stable_diffusion.openvino

[+] andybak|3 years ago|reply
I know several semi-non-technical people that have got this running locally.
[+] CWuestefeld|3 years ago|reply
It took me some time to get the OpenVivo distribution running on my Windows box. It turns out that it wasn't compatible with Python 3.10, I had to go back to 3.9. Maybe that'll help you?
[+] marc_io|3 years ago|reply
I found it surprisingly easy to run it on a 2015 MacBook Pro.
[+] lijogdfljk|3 years ago|reply
That Figma plugin is mind blowing to me. I'm also curious to see how the Blender integration pans out
[+] coding123|3 years ago|reply
In 30 years everything AI generates will be a red circle, because at that point it will have just trained on itself repeatedly.

Instead of labeling data for what things are, we'll have to label things as being generated or not.

[+] throwaway888abc|3 years ago|reply
The collaboration,pace and progress is stunning. If this can applied to other fields such climate change etc.

Great write up

[+] timost|3 years ago|reply
One use case I have in mind is manga drawing. I wonder if anybody has tested manga related generation.