top | item 47202816

(no title)

Are there any up-to-date offline/private agentic coding benchmark leaderboards?

If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization.

Edit: These look decent and generally match my expectations:

https://www.apex-testing.org/

discuss

No comments yet.