top | item 40078032

(no title)

vacuumcl | 1 year ago

Comparing to the numbers here https://www.anthropic.com/news/claude-3-family the ones of Llama 400B seem slightly lower, but of course it's just a checkpoint that they benchmarked and they are still training further.

discuss

causal|1 year ago

Indeed. But if GPT-4 is actually 1.76T as rumored, an open-weight 400B is quite the achievement even if it's only just competitive.

cjbprime|1 year ago

The rumor is that it's a mixture of experts model, which can't be compared directly on parameter count like this because most weights are unused by most inference passes. (So, it's possible that 400B non-MoE is the same approximate "strength" as 1.8T MoE in general.)