top | item 46011590

(no title)

dnr | 3 months ago

The inelegance to me isn't in the definition of the operation, but that it's doing a huge amount of brute-force work to mix every part of the input with every other part, when the answer really only depends on a tiny fraction of the input. If we somehow "just knew" what parts to look at, we could get the answer much more efficiently.

Of course that doesn't really make any sense at the matrix level. And (from what I understand) techniques like MoE move in that direction. So the criticism doesn't really make sense anymore, except in that brains are still much much more efficient than LLMs so we know that we could do better.

discuss

No comments yet.