top | item 46704971 (no title) _0ffh | 1 month ago Yup, as soon as data is the bottleneck and not compute, diffusion wins. Tested following the Chinchilla scaling strategy from 7M to 2.5B parameters.https://arxiv.org/abs/2507.15857 discuss order hn newest No comments yet.
No comments yet.