(no title)
woah | 17 days ago
> Here is why that is backwards. I just showed that a different edit format improves their own models by 5 to 14 points while cutting output tokens by ~20%. That’s not a threat. It’s free R&D.
He makes it sounds like he got a 5-14% boost on a top level benchmark, not 5% improvement on a narrow find and replace metric. Anecdotally, I don't usually have a lot of issues with editing in Claude Code or Cursor, and if there is an issue the model corrects it.
Assuming that it costs double the tokens when it has to correct itself, and find and replace errors are as prominent in actual day to day use as his benchmark, we're talking a 5% efficiency gain in editing token use (not reasoning or tool use). Given that editing must be less than 1/3 of the token use (I assume much less?), we're talking an overall efficiency gain of less than 1%.
This seems like a promising technique but maybe not a high priority in efficiency gains for these tools. The messianic tone, like assuming that Google cut off his access to suppress his genius editing technique rather than just because he was hammering their API also leaves a bad taste, along with the rampant and blatant ChatGPTisms in the blog post.
athrowaway3z|17 days ago
Not sure what they're calculating, but this seems to me like it could be many times more efficient than 20%.
athrowaway3z|16 days ago
https://github.com/offline-ant/pi-hh-read
kridsdale3|17 days ago
andai|17 days ago
bradfa|16 days ago
Most harnesses already have rather thorough solutions for this problem but new insights are still worth understanding.
theahura|17 days ago
That's not a human. It's AI slop.
unknown|17 days ago
[deleted]
stingraycharles|17 days ago