(no title)
storus
|
9 days ago
Didn't thinking tokens resolve the most problematic part of autoregressive models (the first few tokens set the constraints the model can't overcome later) and give it a massive advantage compared to diffusion models by showing the thinking trace? I can see diffusion models being used as a draft model to quickly predict a bunch of tokens and let the autoregressive model decide to use them or throw them away quickly, speeding it up considerably while keeping thinking traces available.
blurbleblurble|9 days ago
Check out this paper where they use diffusion during inference on the autoencoded prediction of an autoregressive model: https://openreview.net/forum?id=c05qIG1Z2B