top | item 45054031

(no title)

thegeomaster | 6 months ago

Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?

discuss

order

ankit219|6 months ago

Oh nothing official. There are people who estimate the sizes based on tok/s, cost, benchmarks etc. The one that most go on is https://lifearchitect.substack.com/p/the-memo-special-editio.... This guy estimated Claude 3 opus to be 2T param model (given the pricing + speed). Opus 4 is 1.2T param according to him (but then I dont understand why the price remained the same.). Sonnet is estimated by various people to be around 100B-200B params.

[1]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...

NoahZuniga|6 months ago

If you're using the api cost of the model to estimate it's size, then you can't use this size estimate to estimate the inference cost.

thegeomaster|6 months ago

tok/s cannot in any way be used to estimate parameters. It's a tradeoff made at inference time. You can adjust your batch size to serve 1 user at a huge tok/s or many users at a slow tok/s.

Der_Einzige|6 months ago

Not everyone uses MoE architectures. It's not outlandish at all...

thegeomaster|6 months ago

There's no way Sonnet 4 or Opus 4 are dense models.