I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.
physicsguy|26 days ago
embedding-shape|26 days ago
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
int_19h|25 days ago
The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.
ozlikethewizard|26 days ago
exceptione|26 days ago
I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.
LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.
trcf23|26 days ago
CatMustard|26 days ago
Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.
No clue has any research been done into this, just a thought OTTOMH.
Perz1val|26 days ago
Sammi|25 days ago
Now you don't have to pay a lot of money to get a mediocre solution that works.
All those things that are broken, but you don't have time or money for them, you can have them fixed now.
xgb84j|26 days ago