Depends on what is a ‘generation’ for LLMs. It would be weird to build a model which is a generation behind. My guess is that like all models, it will be considered the best until the novelty factor wears off and then it will be more or less the same like all modern LLMs - better in some domains, worse in others.
Edit: and it will probably also lead in most major benchmarks which says next to nothing about the quality.
Being ahead of Google is less about raw model quality and more about shipping usable products fast. Anthropic’s advantage seems organizational as much as technical. If Sonnet 5 really halves inference cost while improving reasoning, that’s more disruptive than any benchmark win.
My CTO is pushing 30k line PRs and when asked “how do you know it works” all he can say is “I’m not sure but it probably does. Our customers can QA”. Meanwhile I’m cleaning up half vibed messes from my coworkers that demo’d well.
They’re very powerful, but I think their marketing departments are even more powerful. I do wonder how many of these comments are real people.
Isn't a house too personal that you'd want to get a professional architect with experience to design it, and sign off on it? Even if they used advanced tools like CAD and copy pastes 8/10 of it?
Sure, you can probably one shot notepad.exe but it has no meaning. Meaningful work isn't going anywhere, for the reason that meaningful work lives and lives on by people for people.
No one wants a vibe designed car, unless you are one of those psychos that has no tastes and doesn't care about anything.
I keep trying to use Codex CLI but I love using claude --dangerously-skip-permissions but this seems impossible to do in codex, and it just asks me to approve every command per session. Am I taking crazy pills or is there a way to make codex just run in yolo mode?
pllbnk|26 days ago
Edit: and it will probably also lead in most major benchmarks which says next to nothing about the quality.
fastThinking|27 days ago
RivieraKid|26 days ago
zhshsha|26 days ago
My CTO is pushing 30k line PRs and when asked “how do you know it works” all he can say is “I’m not sure but it probably does. Our customers can QA”. Meanwhile I’m cleaning up half vibed messes from my coworkers that demo’d well.
They’re very powerful, but I think their marketing departments are even more powerful. I do wonder how many of these comments are real people.
panarky|26 days ago
Why is everyone so worried about poison going away?
keyle|26 days ago
Would you want that?
Isn't a house too personal that you'd want to get a professional architect with experience to design it, and sign off on it? Even if they used advanced tools like CAD and copy pastes 8/10 of it?
Sure, you can probably one shot notepad.exe but it has no meaning. Meaningful work isn't going anywhere, for the reason that meaningful work lives and lives on by people for people.
No one wants a vibe designed car, unless you are one of those psychos that has no tastes and doesn't care about anything.
someguyiguess|26 days ago
spants|26 days ago
Learn everything that you can about AI and you will be a great resource. Otherwise, learn a trade. Electricians will be required...........
thomasfromcdnjs|26 days ago
lostmsu|26 days ago
could find in --help
solumunus|26 days ago
column|26 days ago
touwer|26 days ago
Havoc|26 days ago
I think it’s premature to say what’s going to beat what though
tajd|26 days ago