I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
LLMs do a cool parlour trick; all they do is predict “what should the next word be?” But they do it so convincingly that in the right circumstances they seem intelligent. But that’s all it is; a trick. It’s a cool trick, and it has utility, but it’s still just a trick.
All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.
Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.
been experimenting with the same flow as well, it is sort of the motivation behind this project - to streamline the generate code -> detect gaps -> update spec -> implement flow.
curious to hear if you are still seeing code degradation over time?
Create an AGENTS.md that says something like, "when I tell you to do something in a certain way, make a note of this here".
The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.
embedding-shape|28 days ago
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
jwalton|27 days ago
All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.
physicsguy|28 days ago
jinhkuan|27 days ago
curious to hear if you are still seeing code degradation over time?
throwdbaaway|28 days ago
int_19h|26 days ago
The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.