top | item 47108184

(no title)

xscott | 9 days ago

Of course I can't be certain, but I think the "mixture of experts" design plays into it too. Metaphorically, there's a mid-level manager who looks at your prompt and tries to decide which experts it should be sent to. If he thinks you won't notice, he saves money by sending it to the undergraduate intern.

Just a theory.

discuss

order

victorbjorklund|9 days ago

Notice that MOE isn’t different experts for different types of problems. It’s per token and not really connect to problem type.

So if you send a python code then the first one in function can be one expert, second another expert and so on.

dotancohen|9 days ago

Can you back this up with documentation? I don't believe that this is the case.