there frankly needs to be a paper calling this out tho, because at this point there are a bunch of industry models following “llama laws” and nobody’s really done the research, its all monkey see monkey do
If industry groups want to run a training run based on the configurations of a well-performing model, I don't see anything wrong with that. Now, if they were to claim that what they are doing is somehow "optimal", then there would be something to criticize.
swyx|2 years ago
arugulum|2 years ago
If industry groups want to run a training run based on the configurations of a well-performing model, I don't see anything wrong with that. Now, if they were to claim that what they are doing is somehow "optimal", then there would be something to criticize.