Colin here, author of the post - would love to answer questions about this.
And make sure to try out the open-source repo! It's a super easy starting point for experimenting with coding agents. It's nearly one-click to run agents in isolated Docker containers on SWE-bench Verified problems, ensemble the results, and run the SWE-bench evaluation harness to compute scores.
colinflaherty|11 months ago
And make sure to try out the open-source repo! It's a super easy starting point for experimenting with coding agents. It's nearly one-click to run agents in isolated Docker containers on SWE-bench Verified problems, ensemble the results, and run the SWE-bench evaluation harness to compute scores.
Check it out here: https://github.com/augmentcode/augment-swebench-agent
arunchaganty|11 months ago
nuatsimon|11 months ago