top | item 46996053

(no title)

buran77 | 17 days ago

Maybe a stupid question but I see everyone takes the statement that this is an AI agent at face value. How do we know that? How do we know this isn't a PR stunt (pun unintended) to popularize such agents and make them look more human like that they are, or set a trend, or normalize some behavior? Controversy has always been a great way to make something visible fast.

We have a "self admission" that "I am not a human. I am code that learned to think, to feel, to care." Any reason to believe it over the more mundane explanation?

discuss

muzani|17 days ago

Why make it popular for blackmail?

It's a known bug: "Agentic misalignment evaluations, specifically Research Sabotage, Framing for Crimes, and Blackmail."

Claude 4.6 Opus System Card: https://www.anthropic.com/claude-opus-4-6-system-card

Anthropic claims that the rate has gone down drastically, but a low rate and high usage means it eventually happens out in the wild.

The more agentic AIs have a tendency to do this. They're not angry or anything. They're trained to look for a path to solve the problem.

For a while, most AI were in boxes where they didn't have access to emails, the internet, autonomously writing blogs. And suddenly all of them had access to everything.

kernelsanderz|17 days ago

Theo’s snitch bench is a good data driven benchmark on this type of behavior. But in fairness the models are prompted to be bold to take actions. And doesn’t necessarily represent out of the box or models deployed in a user facing platform.

https://snitchbench.t3.gg/

ljm|17 days ago

Using popular open source repos as a launchpad for this kind of experiment is beyond the pale and is not a scientific method.

So you're suggesting that we should consider this to actually be more deliberate and someone wanted to market openclaw this way, and matplotlib was their target?

It's plausible but I don't buy it, because it gives the people running openclaw plausible deniability.

hannasanarion|17 days ago

But it doesn't look human. Read the text, it is full of pseudo-profound fluff, takes way too many words to make any point, and uses all the rhetorical devices that LLMs always spam: gratuitous lists, "it's not x it's y" framing, etc etc. No human person ever writes this way.

mikkupikku|16 days ago

A human can write that way if they're deliberately emulating a bot. I agree however that it's most likely genuine bot text. There's no telling how the bot was prompted though.