top | item 44272540

(no title)

slooonz | 8 months ago

They failed hard with Claude 4 IMO. I just can't have any feedback other than "What a fascinating insight" followed by a reformulation (and, to be generous, an exploration) of what I said, even when Opus 3 has no trouble finding limitations.

By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.

discuss

order

SamPatt|8 months ago

Agreed that o3 can be brutally honest. If you ask it for direct feedback, even on personal topics, it will make observations that, if a person made them, would be borderline rude.

silversmith|8 months ago

Isn't that what "direct feedback" means?

I firmly believe you should be able to hit your fingers with a hammer, and in the process learn whether that's a good idea or not :)

skissane|8 months ago

o3 can be very honest.

But I also find it can get very fixated that some position it has adopted is right, and will then start hallucinating like crazy in defence of that fixation, and then get stuck in a defensive loop of defending its hallucinations with even more hallucinations-by hallucinations I mean stuff like producing lengthy citation lists of invented articles, and then when you point out they don’t exist, claiming stuff like “well when I search PubMed they do”, and when you point out its DOIs are made-up it apologises for the “mistake” and just makes up some more

rapind|8 months ago

Thank god.

simonw|8 months ago

Thanks for this, I just tried the same "give me feedback on this text" prompt against both o3 and Claude 4 and o3 was indeed much more useful and much less sycophantic.

WaltPurvis|8 months ago

Do knowledge cutoff dates matter anymore? The cutoff for o3 was 12 months ago, while the cutoff for Claude 4 was five months ago. I use these models mostly for development (Swift, SwiftUI, and Flutter), and these frameworks are constantly evolving. But with the ability to pull in up-to-date docs and other context, is the knowledge cutoff date still any kind of relevant factor?