top | item 39467098

(no title)

JonathanFly | 2 years ago

From: https://twitter.com/EMostaque/status/1760660709308846135

Some notes:

- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.

- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..

- Will be released open, the preview is to improve its quality & safety just like og stable diffusion

- It will launch with full ecosystem of tools

- It's a new base taking advantage of latest hardware & comes in all sizes

- Enables video, 3D & more..

- Need moar GPUs..

- More technical details soon

>Can we create videos similar like sora

Given enough GPUs and good data yes.

>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?

Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.

(adding some later replies)

>awesome. I assume these aren't heavily cherry picked seeds?

No this is all one generation. With DPO, refinement, further improvement should get better.

>Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?

yeah see @Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...

>Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.

Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.

>Nice. Is it an open-source / open-parameters / open-data model?

Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.

>Cool!!! What do you mean by good data? Can it directly output videos?

If we trained it on video yes, it is very much like the arch of sora.

discuss

cheald|2 years ago

SD 1.5 is 983m parameters, SDXL is 3.5b, for reference.

Very interesting. I've been streching my 12GB 3060 as far as I can; it's exciting that smaller hardware is still usable even with modern improvements.

ttul|2 years ago

Stability has to make money somehow. By releasing an 8B parameter model, they’re encouraging people to use their paid API for inference. It’s not a terrible business decision. And hobbyists can play with the smaller models, which with some refining will probably be just fine for most non-professional use cases.

liuliu|2 years ago

I am going to look at quantization for 8b. But also, these are transformers, so variety of merging / Frankenstein-tune is possible. For example, you can use 8b model to populate the KV cache (which computes once, so can load from slower devices, such as RAM / SSD) and use 800M model for diffusion by replicating weights to match layers of the 8b model.

memossy|2 years ago

800m is good for mobile, 8b for graphics cards.

Bigger than that is also possible, not saturated yet but need more GPUs.

VikingCoder|2 years ago

I'm curious - where are the GPUs with decent processing power but enormous memory? Seems like there'd be a big market for them.

wongarsu|2 years ago

Nvidia is making way too much money keeping cards with lots of memory exclusive to server GPUs they sell with insanely high margins.

AMD still suffers from limited resources and doesn't seem willing to spend too much chasing a market that might just be a temporary hype, Google's TPUs are a pain to use and seem to have stalled out, and Intel lacks commitment, and even their products that went roughly in that direction aren't a great match for neural networks because of their philosophy of having fewer more complex cores.

ls612|2 years ago

MacBooks with M2 or M3 Max. I’m serious. They perform like a 2070 or 2080 but have up to 128GB of unified memory, most of which can be used as VRAM.

SV_BubbleTime|2 years ago

I’ll bet you the Nvidua 50xx series will have cards that are asymmetric for this reason. But nothing that will cannibalize their gaming market.

You’ll be able to get higher resolution but slowly. Or pay the $2800 for a 5090 and get high res with good speed.

weebull|2 years ago

I think the AMD 8600XT is a mod in this direction, otherwise there was little point in releasing it.

GPUs need a decent virtual memory system though. The current "it runs or it crashes" situation isn't good enough.

pbhjpbhj|2 years ago

Nvidia have a system for DMA from GPU to system memory, GPUdirect. That seems like a potentially better route if latency can be handled well.

iosjunkie|2 years ago

I dream of AMD or Intel creating cards to do just that

3abiton|2 years ago

Tesla P40

p1esk|2 years ago

H200 has 141GB, B100 (out next month) will probably have even more. How much memory do you need?

netdur|2 years ago

> - Need moar GPUs..

Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?

memossy|2 years ago

We have highly efficient models for inference and a quantization team.

Need moar GPUs to do a video version of this model similar to Sora now they have proved that Diffusion Transformers can scale with latent patches (see stablevideo.com and our work on that model, currently best open video model).

We have 1/100th of the resources of OpenAI and 1/1000th of Google etc.

So we focus on great algorithms and community.

But now we need those GPUs.

AnthonyMouse|2 years ago

> Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?

There is an inherent trade off between model size and quality. Quantization reduces model size at the expense of quality. Sometimes it's a better way to do that than reducing the number of parameters, but it's still fundamentally the same trade off. You can't make the highest quality model use the smallest amount of memory. It's information theory, not sorcery.

supermatt|2 years ago

I believe he means for training

albertzeyer|2 years ago

I understand that Sora is very popular, so it makes sense to refer to it, but when saying it is similar to Sora, I guess it actually makes more sense to say that it uses a Diffusion Transformer (DiT) (https://arxiv.org/abs/2212.09748) like Sora. We don't really know more details on Sora, while the original DiT has all the details.

tithe|2 years ago

Is anyone else struck by the similarities in textures between the images in the appendix of the above "Scalable Diffusion Models with Transformers" paper?

If you size the browser window right, paging with the arrow keys (so the document doesn't scroll) you'll see (eg, pages 20-21) the textures of the parrot's feathers are almost identical to the textures of bark on the tree behind the panda bear, or the forest behind the red panda is very similar to the undersea environment.

Even if I'm misunderstanding something fundamental here about this technique, I still find this interesting!

cchance|2 years ago

So is this "SDXL safe" or "SD2.1" safe, cause SDXL safe we can deal with, if it's 2.1 safe it's gonna end up DOA for a large part of the opensource community again

astrange|2 years ago

SD2.1 was not "overly safe", SD2.0 was because of a training bug.

2.1 didn't have adoption because people didn't want to deal with the open replacement for CLIP. Or possibly because everyone confused 2.0 and 2.1.

weebull|2 years ago

Don't know about 3.0, but Cascade has different level of safety between the full model and the light model. Full model is far more prudish, but both completely fail with some prompts.

swyx|2 years ago

> SDXL safe we can deal with

how exactly did the community deal with it? interested to learn how to unlearn safety

samstave|2 years ago

>>>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?

>>>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.

Can you fragment responses such that if an edge device (mobile app) is prompted for [thing] it can pass tokens upstream on the prompt -- Torrenting responses effectively - and you could push actual GPU edge devices in certain climates... like dens cities whom are expected to be a Fton of GPU cycle consumption around the edge?

So you have tiered processing (speed is done locally, quality level 1 can take some edge gpu - and corporate shit can be handled in cloud...

----

Can you fragment and torrent a response?

If so, how is that request torn up and routed to appropriate resources?

BOFH me if this is a stupid question? (but its valid for how we are evolving to AI being intrinsic to our society so quickly.)

swyx|2 years ago

> Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.

can someone explain how negation is currently done in stable diffusion? and why cant we do it in text LLMs?

scottmf|2 years ago

you can use negative logit bias

unknown|2 years ago

[deleted]

sandworm101|2 years ago

>> all sorts of edge to giant GPU deployment.

Soon the GPU and its associated memory will be on different cards, as once happened with CPUs. The day of the GPU with ram slots is fast approaching. We will soon plug terabytes of ram into our 4090s, then plug a half-dozen 4090s into a raspberry PI to create a Cronenberg rendering monster. Can it generate movies faster than Pixar can write them? Sure. Can it play Factorio? Heck no.

jsheard|2 years ago

Any seperation of a GPU from its VRAM is going to come at the expense of (a lot of) bandwidth. VRAM is only as fast as it is because the memory chips are as close as possible to the GPU, either on seperate packages immediately next to the GPU package or integrated onto the same package as the GPU itself in the fanciest stuff.

If you don't care about bandwidth you can already have a GPU access terabytes of memory across the PCIe bus, but it's too slow to be useful for basically anything. Best case you're getting 64GB/sec over PCIe 5.0 x16, when VRAM is reaching 3.3TB/sec on the highest end hardware and even mid-range consumer cards are doing >500GB/sec.

Things are headed the other way if anything, Apple and Intel are integrating RAM onto the CPU package for better performance than is possible with socketed RAM.

weebull|2 years ago

No it won't. GPUs are good at ml partly because of the huge memory bandwidth. 1000s of bits wide. You won't find connectors that have that many terminals and maintain signal quality. Even putting a second bank soldered on the same signals can be enough to mess things up.

zettabomb|2 years ago

I doubt it. The latest GPUs utilize HBM which is necessarily part of the same package as the main die. If you had a RAM slot for a GPU you might as well just go out to system RAM, way too much latency to be useful.

ltbarcly3|2 years ago

I don’t think you really understand the current trends in computer architecture. Even cpus are being moved to have on package ram for higher bandwidth. Everything is the opposite of what you said.