(no title)
_0ffh
|
1 month ago
You'd be surprised how quickly improvement of autoregressive language models levels off with epoch count (though, admittedly, one epoch is a LOT). Diffusion language models otoh indeed keep profiting for much longer, fwiw.
zozbot234|1 month ago
_0ffh|1 month ago
https://arxiv.org/abs/2507.15857