Open-source rival for OpenAI’s DALL-E runs on your graphics card

[+] vanadium1st|3 years ago|reply

Stable Diffusion is mind-blowingly good at some things. If you are looking for modern artistic illustrations (like the stuff that you would find on the front page of Artstation) - it's state of the art, better in my opinion then Dalle-2 and Midjourney.

But, the interesting thing is that while it is so good in producing detailed artworks and matching the styles of popular artists, it's surprisingly weak at other things, like interpreting complex original prompts. We've all seen the meme pictures made in Craiyon (previously Dalle-mini) of photoshop-collage-like visual jokes. Stable Diffusion with all its sophistication is much worse at those and is struggling to interpret a lot of prompts that the free and public Craiyon is great with. The compositions are worse, it misses a lot of requested objects or even misses the idea entirely.

Also as good as it is at complex artistic illustrations, it is as bad at minimalistic and simple ones, like logos and icons. I am a logo designer and I am already using AI a lot to produce sketches and ideas for commercial logos, and right now the free and publicly available Craiyon is head and shoulders better at that then Stable Diffusion.

Maybe in the future we will have a universal winner AI that is the best at any style of pictures that you can imagine. But right now we have an interesting competition when different AI have surprising strengths and weaknesses and there's a lot of reason in trying them all.

[+] PoignardAzur|3 years ago|reply

> Of course, with open access and the ability to run the model on a widely available GPU, the opportunity for abuse increases dramatically.

> “A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society.”

Holy shit.

On the one hand, I'm super excited by this technology, and the novel applications that will become possible with these open-source models (stuff that would never be usable if Google and OpenAI had a monopoly on image generation).

On the other hand, I really really really hope Bostrom's urn[0] has no black ball in it, because we as a society seem to be rushing to extract as many balls as possible over increasingly short timescales.

[0] https://nickbostrom.com/papers/vulnerable.pdf

[+] CM30|3 years ago|reply

I don't see why this is incorrect. It seems ever since DALL-E, Midjourney, caught on, it seems like we've got more and more people trying to 'filter out' incorrect uses of their software under the assumption people cannot be trusted to just use it for whatever they want.

And it depresses me, because well... imagine if other pieces of tech were treated this way. If the internet or crypto or computers or whatever were heavily limited/restricted so the 'wrong people' couldn't use them for bad things. We'd consider it ridiculous, yet it's somehow accepted for these image generation systems.

[+] mortenjorck|3 years ago|reply

The length of the democratization cycle we're seeing – months to weeks between a breakthrough model and a competent open-source alternative that runs on commodity hardware – really highlights the genie-stuffing posture of Google and OpenAI. All the thoughtful, if highly paternalistic guardrails they build in amount to little more than fig leaves over the possible applications they intend to close off.

I'm personally in the "AI risk is overstated" camp. But if I'm wrong, all the top-down AI safety in the world is going to be meaningless in the face of a global network of researchers, enthusiasts, and tinkerers.

[+] jackblemming|3 years ago|reply

Yes, I feel much safer if OpenAI and Google are the sole keepers of such technology. They have my and the publics best interest at heart.

[+] Iv|3 years ago|reply

People will generate creepy porn and fake pictures. Humanity will survive this.

[+] sinenomine|3 years ago|reply

What if the black ball was a red herring all along and the usual suspect tech-CEO's hand(s) rushing to control the said crystal ball are the real hazard?

[+] adwi|3 years ago|reply

> [0] https://nickbostrom.com/papers/vulnerable.pdf

Mobile friendly:

https://onlinelibrary.wiley.com/doi/10.1111/1758-5899.12718

[+] manquer|3 years ago|reply

Wouldn’t nuclear weapons or even plastics be a black ball already?

Humanity is not homogeneous, we will always react to new inventions or tools differently , many will use it positively some won’t . Short of weapons of mass destruction I am not sure anything else will destroy civilization itself .

[+] krono|3 years ago|reply

Either we equalise chaos or we reduce chaos, there exist no other options for entropy incarnate.

[+] colordrops|3 years ago|reply

Yet another "open" model that isn't open. We shall see if they actually do release to the public. We keep seeing promises from various orgs but it never pans out.

[+] TulliusCicero|3 years ago|reply

Their plan seems less hand wavey, they're being explicit with "first we release it like this, then like that, then freely to everyone".

You're right that they could always change their minds and that would suck, but so far they seem to be being up front.

[+] bloppe|3 years ago|reply

OpenAI should probably rebrand lol the "open" part is basically parody at this point

[+] dang|3 years ago|reply

Recent and related:

Stable Diffusion launch announcement - https://news.ycombinator.com/item?id=32414811 - Aug 2022 (37 comments)

[+] axg11|3 years ago|reply

I’m excited for the coming race to improve and miniaturise this tech. Apple has a great track record of making ML models light enough to run locally. There will come a day when photorealistic image generation can run on an iPhone.

[+] andrewacove|3 years ago|reply

Maybe this is their long term plan for getting rid of the camera bump.

[+] rasz|3 years ago|reply

3 days from launch to getting your twitter account suspended.

https://twitter.com/DiffusionPics/

[+] ShamelessC|3 years ago|reply

Context?

[+] humanistbot|3 years ago|reply

Can someone tell me how this compares to the guide and repo shared a few days ago on HN: https://news.ycombinator.com/item?id=32384646

[+] sailingparrot|3 years ago|reply

This version is a bit more optimized, and better packaged. Also the model has been trained longer, so when the weights become publicly available the resulting quality should be much higher.

[+] Geee|3 years ago|reply

There's also Disco Diffusion: https://www.reddit.com/r/DiscoDiffusion/

Not sure how they compare. DD seems to be quite popular. I'm currently setting up DD locally.

[+] thorum|3 years ago|reply

If you want to see more examples of what this AI is capable of, check out the subreddit:

https://reddit.com/r/stablediffusion

[+] humanistbot|3 years ago|reply

If anyone from Stability is reading, the confirmation e-mail to sign up is sending a broken link:

"We couldn't process your request at this time. Please try again later. If you are seeing this message repeatedly, please contact Support with the following information:

ip: XXXX

date: Mon Aug 15 2022 XX:XX:XX GMT-0700 (Pacific Daylight Time)

url: https://stability.us18.list-manage.com/subscribe/confirm"

[+] ruuda|3 years ago|reply

The site shows a notification in German that I need to enable JavaScript to use the site, after the first paragraph. But then after that is the full article, including images, which is almost perfectly readable, except it's at 5% opacity (or maybe the JavaScript popup is 95% opacity overlaid on the article), which makes it impossible to read again. :'(

[+] unknown|3 years ago|reply

[deleted]

[+] belltaco|3 years ago|reply

Article says it needs 5.1GB of Graphics RAM.

Does any one know how much data download and disk storage does it need?

[+] _blop|3 years ago|reply

The v1.3 model weighs in at 4.3 GB. There's an additional download of 1.6 GB of other models due to usage of huggingface's transformers (only once on startup). And the conda env takes another 6 GBs due to pytorch and cuda.

Larger images will require (much) more than 5.1 GB. In my case, a target resolution of 768x384 (landscape) with a batch size of 1 will max out my 12GB card, an RTX3080Ti.

[+] luismmolina|3 years ago|reply

If you read directly from the site. The requirements for the graphic card are 10 VRAM as a minimum. Because it's ruins locally you don't need to download anything apart from the initial model, this applies to the disk space too.

[+] kgc|3 years ago|reply

Does this work on Apple silicon processors? They have plenty of RAM accessible to the GPU.

[+] sroussey|3 years ago|reply

The articles says it will, but that it is not using the GPU unfortunately.

[+] 999900000999|3 years ago|reply

Has anyone made a pixel art generator, that can create the animation sprites ?

[+] _w1kke_|3 years ago|reply

@KaliYuga did - she got hired by StabilityAI just a few days ago. Here is a link to the Pixel Art Diffusion notebook:

https://colab.research.google.com/github/KaliYuga-ai/Pixel-A...

[+] 0xdead1eaf|3 years ago|reply

Check out NUWA-Infinity[0][1], submitted to arxiv jul 20, 2022. It captures artistic style very well (though can't speak to the quality of the pixel art it would generate) and can do image to video.

[0] https://nuwa-infinity.microsoft.com/#/ [1] https://arxiv.org/abs/2207.09814

[+] gxqoz|3 years ago|reply

You can use DALL-E and other models to make pixel art ("as pixel art"), although it can both be overkill and hard to get consistent results that you'd put into animation. I'm guessing that starting from more of a video model and then converting to pixel art could be better. Although it's also non-trivial to turn "realistic" video into convincing animation.

[+] panabee|3 years ago|reply

hi there. we're working on this, been working on a model for months now. hope to release something soon. how best to get in touch with you?

[+] chucky123|3 years ago|reply

That's a really good idea.

[+] tckerr|3 years ago|reply

This is exactly the type of application I am interested in as well. As a hobby game dev with only mediocre pixel art skills, having a generator to finish the busy work would be an absolute lifesaver. I'm also interested in using it for fleshing out artistic vision through generating variations of an initial concept.

Hopefully we aren't more than a few years away from something practical like this.

[+] kragen|3 years ago|reply

This article says both that it's "open-source" and that it's "available for [only] research purposes upon request". These can't both be correct. Where is the error?

[+] zone411|3 years ago|reply

They jumped the gun with this announcement. I get wanting to share the excitement of AI doing something cool with the world (I've been there) but they should've waited until it's accessible to the public.

[+] Diris|3 years ago|reply

The code is open source, the models are not.

[+] upupandup|3 years ago|reply

My friend wants to know when she can use this to generate porn, are we close?

[+] unethical_ban|3 years ago|reply

Wait, so the closed source generator known as DALL-E is owned by a company called OpenAI?

[+] stuckinhell|3 years ago|reply

This is pretty amazing, anyone have any tips on building a pc for machine learning with a RAID device ?

[+] jessfyi|3 years ago|reply

Hasn't been updated since 2020, but Tim Dettmer's guide [0] is pretty much the gold standard for optimizing what to buy for which area of DL/ML you're interested in. The pricing has changed thanks to GPU prices coming back down to earth a bit, but what to look out/how much ram you need for which task hasn't. Check out the "TL;DR advice" section then scroll back up for detail info on why and common misconceptions. For the tips on a RAID/NAS setup alongside it, just head to the datahoarders subreddit and their FAQ.

[0] https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...

[+] cellis|3 years ago|reply

Look into building an ethereum mining machine... it can double as an ML workstation. That's what I did.

[+] hedora|3 years ago|reply

If you just want to try it out, consider using a remote cad workstation from a company like paperspace.

(No affiliation.)

[+] dustingetz|3 years ago|reply

doesn’t build on my mac studio due to a dependency whose mac version is two major versions behind

[+] fswd|3 years ago|reply

Unfortunately it's a commercial license and the model isn't available to the public so it isn't very useful.

[+] andybak|3 years ago|reply

It's going to be MIT from what I have heard. On phone atm so can't provide sources.

[+] th1s1sit|3 years ago|reply

Would be a blast if the cloud is up ended by RISC and GPUs powerful enough to crunch “big data” at home.

Would love to see FAANG and SV crash and burn, margins chipped away to nothing.

181 comments