top | item 38634579

(no title)

I know the hugging face leaderboard isn't wildly accurate.

But the top models right now are almost all under 70B. Most are 7B, and the top is 10B. If the benchmarks are even remotely accurate then this is rather wild.

Apparently multiple groups found different "secret sauces", names upstage and whatever UNA is?

discuss

appplication|2 years ago

I mean this isn’t too surprising that smaller models do better. I imagine transformers are as prone to overfitting as any statistical data model. Also there is probably some selection bias: bigger models are more expensive and there are just less people training and iterating with them

spott|2 years ago

There are orders of magnitude fewer people playing with large (>40B) parameter models than the small ones, which means even fewer people finetuning those models.

I can’t imagine this is anything but selection bias.