aplassard | 6 months ago | on: Evals in 2025: going beyond simple benchmarks to build models people can use
I think cost should also be a direct consideration. Model performance varies wildly on benchmarks when given a budget.
https://substack.com/@andrewplassard/note/p-173487568?r=2fqo...