(no title)
thethirdone | 3 months ago
Diffusion just allows you to spend more compute at the same time so you don't redundantly access the same memory. It can only improve speed beyond the memory bandwidth limit by committing multiple tokens each pass.
Other linear models like Mamba get away from O(n^2) effects, but type of neural architecture is orthogonal to the method of generation.
No comments yet.