top | item 43366557

(no title)

volodia | 11 months ago

the LLaDA paper is a scaled-up version of this paper; they cite it as an anonymous ICLR submission

discuss

m00x|11 months ago

I'm not sure if this is what you mean, but LLaDA isn't block text diffusion. This is a mix between an autoregressive model and a diffusion model, which is brand new.

ashirviskas|11 months ago

It is a soft-block text diffusion. They have one super-block of fixed size loaded and then allow the model to only unmask tokens by going through the soft-blocks. As the source code is available and I was able to change it into an actual block diffusion, but as the model was trained only on super-blocks, it was always trying to generate eos tokens at each block end before I extended it. I've tried a few workarounds that half worked, but I guess a very small scale finetune is needed to resolve it fully.

impossiblefork|11 months ago

Ah.