top | item 46704971

(no title)

_0ffh | 1 month ago

Yup, as soon as data is the bottleneck and not compute, diffusion wins. Tested following the Chinchilla scaling strategy from 7M to 2.5B parameters.

https://arxiv.org/abs/2507.15857

discuss

order

No comments yet.