top | item 46811406

(no title)

ofirpress | 1 month ago

Benchmarks can get costly to run- you can reach out to frontier model creators to try and get them to give you free credits, but usually they'll only agree to that once your benchmark is pretty popular.

discuss

Dolores12|1 month ago

so basically they know requests using your API key should be treated with care?

swyx|1 month ago

they could but you can also have some trust in anthropic to have some integrity there, these are earnest people.

"trust but verify" ofc . https://latent.space/p/artificialanalysis do api keys but also mystery shopper checks

Deklomalo|1 month ago

[deleted]

epolanski|1 month ago

The last thing a proper benchmark should do is reveal it's own API key.

plagiarist|1 month ago

IMO it should need a third party running the LLM anyway. Otherwise the evaluated company could notice they're receiving the same requests daily and discover benchmarking that way.

sejje|1 month ago

That's a good thought I hadn't had, actually.

mohsen1|1 month ago

yes I reached out to them but as you say it's a chicken-and-egg problem.

Thanks!