(no title)
lieret | 5 months ago
We've all read & analyzed a large number of agent trajectories. This loophole seems to be something that popped up with the more recent models and we simply weren't aware of it.
As discussed in the github issue, there's a fix in the new version of the SWE-bench containers (currently being rolled out) that makes sure that the relevant commits aren't available.
Part of what makes SWE-bench a very interesting benchmark is the enormous action space that agents that compete on it can take. However that also means that there's unexpected things happening when models get better. We're currently working on making all agent runs easily browsable on a website (rather than having to download our AWS buckets) to get even more eyes on the trajectories. Thanks to everyone who uncovered this loophole.
No comments yet.