(no title)
Game_Ender | 2 months ago
So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.
I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.
morkalork|2 months ago
kuboble|2 months ago
I often use restore conversion checkpoint after successfully completing a side quest.