top | item 38458541

(no title)

MintsJohn | 2 years ago

There's quite a few hosted SDXL platforms (mage.space, leonardo.ai, novel.ai, tensor.art, invoke.ai to name a few) and most consumers do not have the GPUs needed to run those models, only enthusiasts do.

It's always baffled me that stability didn't offer a competitive UI platform to use their models with, clipdrop is just bad quality and very bare-bones, and dreamstudio is pricey and still lacks most features. So this move to a new licensing strategy doesn't surprise me, it actually is somewhat comforting, as i expecting them to just stop releasing further trained models (e.g sdxl1.1 and up), and only offer those on their services (of course, that can still happen) cause how else were they going to monetize the consumers (i know they (planned to) offer custom trained/finetuned models to big corps, but that doesn't monetize consumers).

However, as most releases by stability these days, it has this feeling of close-but-no-cigar, and the recent LCM lora's might be a little slower, but these actually offer 1024^2 resolution, work with any existing lora's and finetunes (so they are usable for iterative development, unlike this turbo model, cause well, it's a different model, can't iterate on it then expect sdxl (with lora's, to a lesser extend also without) to generate a similar image) and support cfg-scale (and therefor negative prompts / prompt weighting). I suppose there's some niche market where you need all the speed you can get, but unless there's a giant leap in (temporal) consistency, that will remain niche, i don't see the mentioned real-time 3d "skinning" neither the video img-to-img (frame-to-frame) gimmicks take off with current quality and lack of flexibility. It's good research, optimizations have lots of value, but it needs quality as well.

Their recent video model is quite bad as well, especially compared to pika and runway gen-2, but well, but as with the the dalle-3 comparison one can say those are closed source and stability's offering is open.

Then we have the 3d model, close sourced, worse than luma's genie unfortunately.

The music model is nothing like suno's chirp (which might be multiple models, bark and a music model) used together), and the less said about their llm offerings the better.

Bottom line, stability needs a killer model again, they started strong with stable diffusion 1.5, took a wrong turn with 2.0 (kind of recovered by 2.1, but the damage was done), and while SDXL is't bad in a vacuum, neither was it the leap ahead that put it in front of competition like midjourney at the time, and Dalle-3 a little later, and now even a relatively small model like pixart-alpha, also opensource, can offer similar quality to what sdxl offers (with a lot of caveats, as it has been trained on so few images it just doesn't have info on many concepts). And more worrying, there's no hint of something better in the stability's pipeline. But maybe image-gen is as best as stability can get it, and they think they can make an impact pivoting in another direction or multiple directiobs, but currently, it feels a master-of-none situation.

discuss

No comments yet.