top | item 43416828

(no title)

ezyang | 11 months ago

Yeah, Sonnet 3.5/3.7 are doing heavy lifting. Maybe the SOTA Gemini models would do better, I haven't tried them. Generating correct patches is a funny minigame that isn't really solved, despite how easy it is to RL on.

discuss

diggan|11 months ago

> Maybe the SOTA Gemini models would do better, I haven't tried them

As I had to upgrade my Google Drive storage like a month ago, I gave them all a try. Short version: If you have paid plan with OpenAI/Claude already, none of them come even close, for coding at least. I thought I was trying the wrong models at first, but after confirming it seems like Google is just really far behind.

woah|11 months ago

Strange to read this and the parent comment, since Cursor has never made a single error applying patches for me. The closest it's come is when the coding model adds unnecessary changes which of course is a completely different thing.

logicchains|11 months ago

Which model are you using with Cursor?

logicchains|11 months ago

o3-mini works well enough for me, it makes mistakes but generally it can always fix them eventually. Interestingly I found even if I include the line numbers as comments in the code it sees, it still often gets the line numbers wrong for edits (most often, off by one errors, likely due to it mixing up whether the line numbers are inclusive or exclusive). What does work a bit better is asking it to provide regex matching the first and last line of what it wants to replace, along with nearby line numbers (so if there are multiple matches in that file for the regex, it gets the right one).