Benchmarks can get costly to run- you can reach out to frontier model creators to try and get them to give you free credits, but usually they'll only agree to that once your benchmark is pretty popular.
IMO it should need a third party running the LLM anyway. Otherwise the evaluated company could notice they're receiving the same requests daily and discover benchmarking that way.
Dolores12|1 month ago
swyx|1 month ago
"trust but verify" ofc . https://latent.space/p/artificialanalysis do api keys but also mystery shopper checks
Deklomalo|1 month ago
[deleted]
epolanski|1 month ago
plagiarist|1 month ago
sejje|1 month ago
mohsen1|1 month ago
Thanks!