(no title)
sqs | 3 months ago
Here are some early rough numbers from our own internal usage on the Amp team (avg cost $ per thread):
- Sonnet 4.5: $1.83
- Opus 4.5: $1.30 (earlier checkpoint last week was $1.55)
- Gemini 3 Pro: $1.21
Cost per token is not the right way to look at this. A bit more intelligence means mistakes (and wasted tokens) avoided.
localhost|3 months ago
Much better to look at cost per task - and good to see some benchmarks reporting this now.
IgorPartola|3 months ago
leo_e|3 months ago
If a cheaper model hallucinates halfway through a multi-step agent workflow, I burn more tokens on verification and error correction loops than if I just used the smart model upfront. 'Cost per successful task' is the only metric that matters in production.
andai|3 months ago
ArtificialAnalysis has a "intelligence per token" metric on which all of Anthropic's models are outliers.
For some reason, they need way less output tokens than everyone else's models to pass the benchmarks.
(There are of course many issues with benchmarks, but I thought that was really interesting.)
tmaly|3 months ago
sqs|3 months ago
If you use very long threads and treat it as a long-and-winding conversation, you will get worse results and pay a lot more.