top | item 46955182

(no title)

woeirua | 20 days ago

That's such a huge delta that Anthropic might be onto something...

discuss

Anthropic has been the only AI company actually caring about AI safety. Here’s a dated benchmark but it’s a trend Ive never seen disputed https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...

CuriouslyC|20 days ago

Claude is more susceptible than GPT5.1+. It tries to be "smart" about context for refusal, but that just makes it trickable, whereas newer GPT5 models just refuse across the board.

nradov|20 days ago

That is not a meaningful benchmark. They just made shit up. Regardless of whether any company cares or not, the whole concept of "AI safety" is so silly. I can't believe anyone takes it seriously.

LeoPanthera|20 days ago

This might also be why Gemini is generally considered to give better answers - except in the case of code.

Perhaps thinking about your guardrails all the time makes you think about the actual question less.

mh2266|20 days ago

re: that, CC burning context window on this silly warning on every single file is rather frustrating: https://github.com/anthropics/claude-code/issues/12443

unknown|20 days ago

[deleted]

rahidz|20 days ago

Or Anthropic's models are intelligent/trained on enough misalignment papers, and are aware they're being tested.

bofadeez|20 days ago

Huh? https://alignment.anthropic.com/2026/hot-mess-of-ai/