(no title)
gklitt | 10 months ago
Claude Code did great and wrote pretty decent docs.
Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.
I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.
I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.
strangescript|10 months ago
I feel like people are sleeping on Claude Code for one reason or another. Its not cheap, but its by far the best, most consistent experience I have had.
artdigital|10 months ago
These days I’m using Amazon Q Pro on the CLI. Very similar experience to Claude Code minus a few batteries. But it’s capped at $20/mo and won’t set my credit card on fire.
ekabod|10 months ago
[1]https://aider.chat/
Aeolun|10 months ago
It’s too expensive for what it does though. And it starts failing rapidly when it exhausts the context window.
ilaksh|10 months ago
gklitt|10 months ago
ksec|10 months ago
Sometimes I see an area of AI/LLM that I thought even with 10x efficiency improvement and 10x hardware resources which is 100x in aggregate it will still be no where near good enough.
The truth is probably somewhere in the middle. Which is why I dont believe AGI will be here any time soon. But Assisted Intelligence is no doubt in its iPhone moment and continue for another 10 years before hopefully another breakthrough.
enether|10 months ago
recommended read - https://transluce.org/investigating-o3-truthfulness
I wonder if this is what's causing it to do badly in these cases
victor9000|10 months ago
AGI may well be on its way, as the mode is mastering the fine art of bullshitting.
unknown|10 months ago
[deleted]
unknown|10 months ago
[deleted]
kristopolous|10 months ago