top | item 47037310 (no title) collinwilkins | 13 days ago at this point it seems every new model scores within a few points of each other on SWE-bench. the actual differentiator is how well it handles multi-step tool use without losing the plot halfway through and how well it works with an existing stack discuss order hn newest No comments yet.
No comments yet.