(no title)
jcorco | 6 months ago
The approach is to use workloads defined by developers and end users (not providers) that reflect their real-world tasks. E.g. in finance, delivering market snapshots to trading engines. We test full stacks, holding some layers constant so you can isolate the effect of hardware, software, or models. Every run goes through an independent third-party audit to ensure consistent conditions, no cherry-picking of results, and full disclosure of config and tuning, so that the results are reproducible and the comparisons are fair.
In finance, the benchmarks are trusted enough to drive major infrastructure decisions by the leading banks and hedge funds, and in some cases to inform regulatory discussions, e.g. around how the industry handles time synchronization.
Now starting to apply the same principles to the AI benchmarking space. Would love to talk to anyone who wants to be involved?
khalic|6 months ago
So the business model would be AI foundries contracting you for evaluating their models?
Do you envision some kind of freely accessible platform for consulting the results?