top | item 43483723

(no title)

yaronsc | 11 months ago

Benchmarks are WIP. We're thinking about durability, task latency, agent throughput. What else would you like to see?

discuss

namnnumbr|11 months ago

Pass^k and not Pass@k (see https://www.philschmid.de/agents-pass-at-k-pass-power-k). Would be a great twofer to see the code used to run the benchmarks as examples.

yaronsc|11 months ago

Will take a look, thanks!