top | item 41260648

(no title)

hbrundage | 1 year ago

Isn't 63% => 54% regression on MMLU-Pro a huge issue? They said that it excels at advanced reasoning but that seems like a big drawback there.

discuss

order

kainan-ai|1 year ago

Yeah it doesn't win in every category. I will say watching it in the discord I saw its performance vary widely so the context and sys prompt plays a huge role. Initially it did great and solved some pretty heavy logic questions but after the context was loaded with trolling it degraded quite a bit and couldn't solve problems it previously was able to.