top | item 43540498

(no title)

colinflaherty | 11 months ago

Colin here, author of the post - would love to answer questions about this.

And make sure to try out the open-source repo! It's a super easy starting point for experimenting with coding agents. It's nearly one-click to run agents in isolated Docker containers on SWE-bench Verified problems, ensemble the results, and run the SWE-bench evaluation harness to compute scores.

Check it out here: https://github.com/augmentcode/augment-swebench-agent

discuss

order

arunchaganty|11 months ago

Nice! I know it's super hard to sota a benchmark, especially a few years into it, so congrats on the milestone!