top | item 47087199 (no title) cubefox | 9 days ago This doesn't mention the drawback of diffusion language models, the main reason why nobody is using them: they have significantly lower performance on benchmarks than autoregressive models at similar size. discuss order hn newest No comments yet.
No comments yet.