top | item 37152938

(no title)

I disagree with this. Binarized MNIST samples of any reasonable quality are (still) tricky to get right without a hierarchical system (read: VQ-VAE tokens or some such encoder space). Same with really solid CIFAR-10. "Scaling down" is a different problem than scaling up, not everything transfers but saying "everything works on MNIST / CIFAR-10" in generative modeling is a bit glib.

Would much prefer to see early work with solid small scale results on arXiV, than have people hold concepts for another 6 months scaling up. Let that be for a v2, if you cannot put early but concrete results on arXiV where else is there?

Recalling that a lot of nice papers are mostly MNIST / CIFAR-10 level results at first, followed by scale (thinking of VQ-VAE, PixelCNN / RNN, PerceiverAR, many others that worked well at scale later). That doesn't mean every result will scale up, but we have a lot of tricks to scale "small-scale" models using pretrained latent spaces and so on. The first diffusion results were also pretty small scale... different time but I don't think things are so different today.

That said, I can agree that you need to be a bit in the weeds on the research side to be diving deep on this - but I expect lots of followup clarifications or blog posts on this type of work.

discuss

No comments yet.