top | item 44904739

(no title)

fdsjgfklsfd | 6 months ago

Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs?

discuss

order

No comments yet.