top | item 44904739 (no title) fdsjgfklsfd | 6 months ago Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs? discuss order hn newest No comments yet.
No comments yet.