top | item 31358322

(no title)

I really like the descriptions from SUNDAE (https://arxiv.org/abs/2112.06749) if you have some background about general neural net style modeling, and generally find the multinomial or binomial diffusion settings a bit simpler to think about conceptually (if a bit more difficult in practice due to the harshness of the noise). There are other papers focused on these settings too (even the origin diffusion work in the NN sphere http://proceedings.mlr.press/v37/sohl-dickstein15.html) but again - the math is at the forefront (https://arxiv.org/abs/2111.14822 , https://arxiv.org/abs/2102.05379)

But a lot of the diffusion literature does focus the math, since finding tighter bounds and proving that things converge to the true likelihood etc. etc. are current and recent contributions in research (cf https://proceedings.neurips.cc/paper/2021/hash/c11abfd29e4d9... or https://proceedings.neurips.cc/paper/2021/hash/b578f2a52a022...)

The summaries by Yang Song (https://yang-song.github.io/blog/2021/score/) and Lilian Weng (https://lilianweng.github.io/posts/2021-07-11-diffusion-mode...) are arguably the definitive summaries, but there is math there too.

Personally the idea of training a model that is learning to go from (more noise -> less noise) stepwise is a pretty intuitive one (used to call it iterative inference I guess), but that simple message does get wrapped up in proofs and theorems quite a lot in the literature right now.

If you make analogy to GAN generators, which go from noise -> data in one shot (and presumably might need to do this kind of iteration/denoising, implicitly and internally), you are kind of relaxing the modeling problem and allowing for variable compute time at prediction (as opposed to trying to train a GAN with a huge number of layers in the generator).

Similar analogies also hold when looking at the VAE formulation, and seeing it as a mapping from gaussian noise (latent/Z) to data via the decoder following in the tradition of latent variable modeling setups like LDA, with the encoder being a practical and useful necessity to map into this latent space (some early slides from Durk Kingma and Max Welling present/relate VAE in this light - particularly the "plate diagram" representation of VAE highlights this well). Similar analogies also hold for flow based models, and are used frequently to define and teach about flow-based generative models (https://lilianweng.github.io/posts/2018-10-13-flow-models/).

Ultimately (in my opinion) each of these branches has their own "math corner" people spend time in - minmax game stuff for GAN, ELBO / bounds for VAE (or deriving new priors), bijection / invertibility in flows, and now noise schedules for diffusion. Just part of research I guess.

But these diffusion models are pretty straightforward to train, and pretty powerful in my experience so far - definitely worth cutting through the noise if you are interested in generative models but (like me) aren't overly invested in the math-parts

Jascha's slides with the "dye in water" analogy (starting ~slide 18, https://www.lri.fr/TAU_seminars/videos/Jascha_Sohl_Dickstein...) are a great intuitive introduction to the concept

discuss

No comments yet.