The SWE-Bench scores are very, very high for an open source model of this size. 46.8% is better than o3-mini (with Agentless-lite) and Claude 3.6 (with AutoCodeRover), but it is a little lower than Claude 3.6 with Anthropic's proprietary scaffold. And considering you can run this for almost free, this is a very extraordinary model.
AstroBen|9 months ago
echelon|9 months ago
sagarpatil|9 months ago
svantana|9 months ago
falcor84|9 months ago
oofbaroomf|9 months ago