(no title)
ezyang
|
11 months ago
Yeah, Sonnet 3.5/3.7 are doing heavy lifting. Maybe the SOTA Gemini models would do better, I haven't tried them. Generating correct patches is a funny minigame that isn't really solved, despite how easy it is to RL on.
diggan|11 months ago
As I had to upgrade my Google Drive storage like a month ago, I gave them all a try. Short version: If you have paid plan with OpenAI/Claude already, none of them come even close, for coding at least. I thought I was trying the wrong models at first, but after confirming it seems like Google is just really far behind.
woah|11 months ago
logicchains|11 months ago
logicchains|11 months ago