(no title)
rain1 | 8 months ago
> it somehow merged Llama 4 Maverick's custom Arena chatbot version with Behemoth
I can clarify this part. I wrote 'There was a scandal as facebook decided to mislead people by gaming the lmarena benchmark site - they served one version of llama-4 there and released a different model' which is true.
But it is inside the section about the llama 4 model behemoth. So I see how that could be confusing/misleading.
I could restructure that section a little to improve it.
> Llama 405B was also trained on more than 15 trillion tokens[1],
You're talking about Llama 405B instruct, I'm talking about Llama 405B base. Of course the instruct model has been traiend on more tokens.
> why is there such a focus on token training count?
I tried to include the rough training token count for each model I wrote about - plus additional details about training data mixture if available. Training data is an important part of an LLM.
No comments yet.