top | item 40206925

(no title)

Hey all,

I thought the paper “Discrete Diffusion Modeling by Estimating the ratios of the Data Distribution” was a pretty cool idea, so decided to dive deep into the code, strip it down so I could understand it, then train some models from scratch. My findings are linked here:

https://www.oxen.ai/blog/how-to-train-diffusion-for-text-fro...

I find the diffusion papers a bit difficult to read and looking at the inputs and outputs of code really help me grok what’s going on.

Main takeaways are:

1) It is yet to be seen if these techniques will scale in both data and model size 2) Is an interesting technique in general, kind of wild that the Monte Carlo sampling and denoising works at all 3) The infilling isn’t a super big selling point as is because the context length is fixed during diffusion. You’d have to layer in some hacks to make it work well for code completion or other use cases.

Curious what you guys think about diffusion for text, and hopefully this gives people a jumping off point for understanding and implementing your own!

Props to @louaaron and his team at Stanford and Pika Labs for the initial paper and implementation.

discuss

No comments yet.