We got better and better models when we threw more and more compute? I gotta work on my snarkiness. Seriously, that's pretty good empirical evidence. The smaller models we get are all some kind of distillation or student model of a larger model, so they can never claim they are not the result of large compute.
t_mann|11 months ago
reportgunner|11 months ago
neverokay|11 months ago