(no title)
carsoon | 2 months ago
But these raw models (which i test through direct api calls) are much better. The biggest change with regards to price was through mixture of experts which allowed keeping quality very similar and dropping compute 10x. (This is what allowed deepseek v3 to have similar quality to gpt-4o at such a lower price.)
This same tech has most likely been applied to these new models and now we have 1T-100T? parameter models with the same cost as 4o through mixture of experts. (this is what I'd guess at least)
No comments yet.