Stable Diffusion is mind-blowingly good at some things. If you are looking for modern artistic illustrations (like the stuff that you would find on the front page of Artstation) - it's state of the art, better in my opinion then Dalle-2 and Midjourney.
But, the interesting thing is that while it is so good in producing detailed artworks and matching the styles of popular artists, it's surprisingly weak at other things, like interpreting complex original prompts. We've all seen the meme pictures made in Craiyon (previously Dalle-mini) of photoshop-collage-like visual jokes. Stable Diffusion with all its sophistication is much worse at those and is struggling to interpret a lot of prompts that the free and public Craiyon is great with. The compositions are worse, it misses a lot of requested objects or even misses the idea entirely.
Also as good as it is at complex artistic illustrations, it is as bad at minimalistic and simple ones, like logos and icons. I am a logo designer and I am already using AI a lot to produce sketches and ideas for commercial logos, and right now the free and publicly available Craiyon is head and shoulders better at that then Stable Diffusion.
Maybe in the future we will have a universal winner AI that is the best at any style of pictures that you can imagine. But right now we have an interesting competition when different AI have surprising strengths and weaknesses and there's a lot of reason in trying them all.
> Of course, with open access and the ability to run the model on a widely available GPU, the opportunity for abuse increases dramatically.
> “A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society.”
Holy shit.
On the one hand, I'm super excited by this technology, and the novel applications that will become possible with these open-source models (stuff that would never be usable if Google and OpenAI had a monopoly on image generation).
On the other hand, I really really really hope Bostrom's urn[0] has no black ball in it, because we as a society seem to be rushing to extract as many balls as possible over increasingly short timescales.
I don't see why this is incorrect. It seems ever since DALL-E, Midjourney, caught on, it seems like we've got more and more people trying to 'filter out' incorrect uses of their software under the assumption people cannot be trusted to just use it for whatever they want.
And it depresses me, because well... imagine if other pieces of tech were treated this way. If the internet or crypto or computers or whatever were heavily limited/restricted so the 'wrong people' couldn't use them for bad things. We'd consider it ridiculous, yet it's somehow accepted for these image generation systems.
The length of the democratization cycle we're seeing – months to weeks between a breakthrough model and a competent open-source alternative that runs on commodity hardware – really highlights the genie-stuffing posture of Google and OpenAI. All the thoughtful, if highly paternalistic guardrails they build in amount to little more than fig leaves over the possible applications they intend to close off.
I'm personally in the "AI risk is overstated" camp. But if I'm wrong, all the top-down AI safety in the world is going to be meaningless in the face of a global network of researchers, enthusiasts, and tinkerers.
What if the black ball was a red herring all along and the usual suspect tech-CEO's hand(s) rushing to control the said crystal ball are the real hazard?
Wouldn’t nuclear weapons or even plastics be a black ball already?
Humanity is not homogeneous, we will always react to new inventions or tools differently , many will use it positively some won’t . Short of weapons of mass destruction I am not sure anything else will destroy civilization itself .
Yet another "open" model that isn't open. We shall see if they actually do release to the public. We keep seeing promises from various orgs but it never pans out.
I’m excited for the coming race to improve and miniaturise this tech. Apple has a great track record of making ML models light enough to run locally. There will come a day when photorealistic image generation can run on an iPhone.
This version is a bit more optimized, and better packaged.
Also the model has been trained longer, so when the weights become publicly available the resulting quality should be much higher.
If anyone from Stability is reading, the confirmation e-mail to sign up is sending a broken link:
"We couldn't process your request at this time. Please try again later. If you are seeing this message repeatedly, please contact Support with the following information:
ip: XXXX
date: Mon Aug 15 2022 XX:XX:XX GMT-0700 (Pacific Daylight Time)
The site shows a notification in German that I need to enable JavaScript to use the site, after the first paragraph. But then after that is the full article, including images, which is almost perfectly readable, except it's at 5% opacity (or maybe the JavaScript popup is 95% opacity overlaid on the article), which makes it impossible to read again. :'(
The v1.3 model weighs in at 4.3 GB. There's an additional download of 1.6 GB of other models due to usage of huggingface's transformers (only once on startup). And the conda env takes another 6 GBs due to pytorch and cuda.
Larger images will require (much) more than 5.1 GB. In my case, a target resolution of 768x384 (landscape) with a batch size of 1 will max out my 12GB card, an RTX3080Ti.
If you read directly from the site. The requirements for the graphic card are 10 VRAM as a minimum. Because it's ruins locally you don't need to download anything apart from the initial model, this applies to the disk space too.
Check out NUWA-Infinity[0][1], submitted to arxiv jul 20, 2022. It captures artistic style very well (though can't speak to the quality of the pixel art it would generate) and can do image to video.
You can use DALL-E and other models to make pixel art ("as pixel art"), although it can both be overkill and hard to get consistent results that you'd put into animation. I'm guessing that starting from more of a video model and then converting to pixel art could be better. Although it's also non-trivial to turn "realistic" video into convincing animation.
This is exactly the type of application I am interested in as well. As a hobby game dev with only mediocre pixel art skills, having a generator to finish the busy work would be an absolute lifesaver. I'm also interested in using it for fleshing out artistic vision through generating variations of an initial concept.
Hopefully we aren't more than a few years away from something practical like this.
This article says both that it's "open-source" and that it's "available for [only] research purposes upon request". These can't both be correct. Where is the error?
They jumped the gun with this announcement. I get wanting to share the excitement of AI doing something cool with the world (I've been there) but they should've waited until it's accessible to the public.
Hasn't been updated since 2020, but Tim Dettmer's guide [0] is pretty much the gold standard for optimizing what to buy for which area of DL/ML you're interested in. The pricing has changed thanks to GPU prices coming back down to earth a bit, but what to look out/how much ram you need for which task hasn't. Check out the "TL;DR advice" section then scroll back up for detail info on why and common misconceptions. For the tips on a RAID/NAS setup alongside it, just head to the datahoarders subreddit and their FAQ.
[+] [-] vanadium1st|3 years ago|reply
But, the interesting thing is that while it is so good in producing detailed artworks and matching the styles of popular artists, it's surprisingly weak at other things, like interpreting complex original prompts. We've all seen the meme pictures made in Craiyon (previously Dalle-mini) of photoshop-collage-like visual jokes. Stable Diffusion with all its sophistication is much worse at those and is struggling to interpret a lot of prompts that the free and public Craiyon is great with. The compositions are worse, it misses a lot of requested objects or even misses the idea entirely.
Also as good as it is at complex artistic illustrations, it is as bad at minimalistic and simple ones, like logos and icons. I am a logo designer and I am already using AI a lot to produce sketches and ideas for commercial logos, and right now the free and publicly available Craiyon is head and shoulders better at that then Stable Diffusion.
Maybe in the future we will have a universal winner AI that is the best at any style of pictures that you can imagine. But right now we have an interesting competition when different AI have surprising strengths and weaknesses and there's a lot of reason in trying them all.
[+] [-] PoignardAzur|3 years ago|reply
> “A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society.”
Holy shit.
On the one hand, I'm super excited by this technology, and the novel applications that will become possible with these open-source models (stuff that would never be usable if Google and OpenAI had a monopoly on image generation).
On the other hand, I really really really hope Bostrom's urn[0] has no black ball in it, because we as a society seem to be rushing to extract as many balls as possible over increasingly short timescales.
[0] https://nickbostrom.com/papers/vulnerable.pdf
[+] [-] CM30|3 years ago|reply
And it depresses me, because well... imagine if other pieces of tech were treated this way. If the internet or crypto or computers or whatever were heavily limited/restricted so the 'wrong people' couldn't use them for bad things. We'd consider it ridiculous, yet it's somehow accepted for these image generation systems.
[+] [-] mortenjorck|3 years ago|reply
I'm personally in the "AI risk is overstated" camp. But if I'm wrong, all the top-down AI safety in the world is going to be meaningless in the face of a global network of researchers, enthusiasts, and tinkerers.
[+] [-] jackblemming|3 years ago|reply
[+] [-] Iv|3 years ago|reply
[+] [-] sinenomine|3 years ago|reply
[+] [-] adwi|3 years ago|reply
Mobile friendly:
https://onlinelibrary.wiley.com/doi/10.1111/1758-5899.12718
[+] [-] manquer|3 years ago|reply
Humanity is not homogeneous, we will always react to new inventions or tools differently , many will use it positively some won’t . Short of weapons of mass destruction I am not sure anything else will destroy civilization itself .
[+] [-] krono|3 years ago|reply
[+] [-] colordrops|3 years ago|reply
[+] [-] TulliusCicero|3 years ago|reply
You're right that they could always change their minds and that would suck, but so far they seem to be being up front.
[+] [-] bloppe|3 years ago|reply
[+] [-] dang|3 years ago|reply
Stable Diffusion launch announcement - https://news.ycombinator.com/item?id=32414811 - Aug 2022 (37 comments)
[+] [-] axg11|3 years ago|reply
[+] [-] andrewacove|3 years ago|reply
[+] [-] rasz|3 years ago|reply
https://twitter.com/DiffusionPics/
[+] [-] ShamelessC|3 years ago|reply
[+] [-] humanistbot|3 years ago|reply
[+] [-] sailingparrot|3 years ago|reply
[+] [-] Geee|3 years ago|reply
Not sure how they compare. DD seems to be quite popular. I'm currently setting up DD locally.
[+] [-] thorum|3 years ago|reply
https://reddit.com/r/stablediffusion
[+] [-] humanistbot|3 years ago|reply
"We couldn't process your request at this time. Please try again later. If you are seeing this message repeatedly, please contact Support with the following information:
ip: XXXX
date: Mon Aug 15 2022 XX:XX:XX GMT-0700 (Pacific Daylight Time)
url: https://stability.us18.list-manage.com/subscribe/confirm"
[+] [-] ruuda|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] belltaco|3 years ago|reply
Does any one know how much data download and disk storage does it need?
[+] [-] _blop|3 years ago|reply
Larger images will require (much) more than 5.1 GB. In my case, a target resolution of 768x384 (landscape) with a batch size of 1 will max out my 12GB card, an RTX3080Ti.
[+] [-] luismmolina|3 years ago|reply
[+] [-] kgc|3 years ago|reply
[+] [-] sroussey|3 years ago|reply
[+] [-] 999900000999|3 years ago|reply
[+] [-] _w1kke_|3 years ago|reply
https://colab.research.google.com/github/KaliYuga-ai/Pixel-A...
[+] [-] 0xdead1eaf|3 years ago|reply
[0] https://nuwa-infinity.microsoft.com/#/ [1] https://arxiv.org/abs/2207.09814
[+] [-] gxqoz|3 years ago|reply
[+] [-] panabee|3 years ago|reply
[+] [-] chucky123|3 years ago|reply
[+] [-] tckerr|3 years ago|reply
Hopefully we aren't more than a few years away from something practical like this.
[+] [-] kragen|3 years ago|reply
[+] [-] zone411|3 years ago|reply
[+] [-] Diris|3 years ago|reply
[+] [-] upupandup|3 years ago|reply
[+] [-] unethical_ban|3 years ago|reply
[+] [-] stuckinhell|3 years ago|reply
[+] [-] jessfyi|3 years ago|reply
[0] https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...
[+] [-] cellis|3 years ago|reply
[+] [-] hedora|3 years ago|reply
(No affiliation.)
[+] [-] dustingetz|3 years ago|reply
[+] [-] fswd|3 years ago|reply
[+] [-] andybak|3 years ago|reply
[+] [-] th1s1sit|3 years ago|reply
Would love to see FAANG and SV crash and burn, margins chipped away to nothing.