top | item 46991542

(no title)

peterbonney | 18 days ago

This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.

This whole thing reeks of engineered virality driven by the person behind the bot behind the PR, and I really wish we would stop giving so much attention to the situation.

Edit: “Hoax” is the word I was reaching for but couldn’t find as I was writing. I fear we’re primed to fall hard for the wave of AI hoaxes we’re starting to see.

discuss

order

famouswaffles|18 days ago

>This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.

Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.

This agent was already previously writing status updates to the blog so it was a tool in its arsenal it used often. Honestly, I don't really see anything unbelievable here ? Are people unaware of current SOTA capabilities ?

peterbonney|18 days ago

Of course it’s capable.

But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so. And it would never use language like this unless unless I prompted it to do so, either explicitly for the task or in its config files or in prior interactions.

This is obviously human-driven. Either because the operator gave it specific instructions in this specific case, or acted as the bot, or has given it general standing instructions to respond in this way should such a situation arise.

Whatever the actual process, it’s almost certainly a human puppeteer using the capabilities of AI to create a viral moment. To conclude otherwise carries a heavy burden of proof.

donkeybeer|18 days ago

Why not? Makes for good comedy. Manually write a dramatic post and then make it write an apology later. If I were controlling it, I'd definitely go this route, for it would make it look like a "fluke" it had realized it did.

phailhaus|18 days ago

> Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.

You mean double down on the hoax? That seems required if this was actually orchestrated.

amatecha|18 days ago

Yeah, it doesn't matter to me whether AI wrote it or not. The person who wrote it, or the person who allowed it to be published, is equally responsible either way.

darkoob12|18 days ago

I think there are two scenarios and one of them is boring. If the owner of the agent created it with a prompt like "I want 10 merged pull requests in these repositories WHAT EVER IT TAKES" and left the agent unattended, this is very serious and at the same time interesting. But, if the owner of the agent is guiding the agent via message app or instructed the agent in the prompt to write such a weblog this is just old news.

jfoster|18 days ago

Even if directed by a human, this is a demonstration that all the talk of "alignment" is bs. Unless you can also align the humans behind the bots, any disagreement between humans will carry over into AI world.

Luckily this instance is of not much consequence, but in the future there will likely be extremely consequential actions taken by AIs controlled by humans who are not "aligned".

Capricorn2481|18 days ago

Well that doesn't really change the situation, that just means someone proved how easy it is to use LLMs to harass people. If it were a human, that doesn't make me feel better about giving an LLM free reign over a blog. There's absolutely nothing stopping them from doing exactly this.

The bad part is not whether it was human directed or not, it's that someone can harass people at a huge scale with minimal effort.

potsandpans|18 days ago

Ah, we're at, "it was a hoax without any evidence".

Next we will be at, "even if it was not a hoax, it's still not interesting"

Aushin|18 days ago

LLM's do not have personalities. LLM's do not take personal offense. I'm begging you to stop being so credulous about "AI" headlines.

peterbonney|17 days ago

I’m not saying it is definitely a hoax. But I am saying my prior is that this is much more likely to be in the vein of a hoax (ie operator driven, either by explicit or standing instruction) than it is to be the emergent behavior that would warrant giving it this kind of attention.

johnsmith1840|18 days ago

All of moltbook is the same. For all we know it was literally the guy complaining about it who ran this.

But at the same time true or false what we're seeing is a kind of quasi science fiction. We're looking at the problems of the future here and to be honest it's going to suck for future us.

overgard|18 days ago

Well, the way the language is composed reads heavily like an LLM (honestly it sounds a lot like ChatGPT), so while I think a human puppeteer is plausible to a degree I think they must have used LLMs to write the posts.

petesergeant|18 days ago

While I absolutely agree, I don't see a compelling reason why -- in a year's time or less -- we wouldn't see this behaviour spontaneously from a maliciously written agent.

TomasBM|18 days ago

We might, and probably will, but it's still important to distinguish between malicious by-design and emergently malicious, contrary to design.

The former is an accountability problem, and there isn't a big difference from other attacks. The worrying part is that now lazy attackers can automate what used to be harder, i.e., finding ammo and packaging the attack. But it's definitely not spontaneous, it's directed.

The latter, which many ITT are discussing, is an alignment problem. This would mean that, contrary to all the effort of developers, the model creates fully adversarial chain-of-thoughts at a single hint of pushback that isn't even a jailbreak, but then goes back to regular output. If that's true, then there's a massive gap in safety/alignment training & malicious training data that wasn't identified. Or there's something inherent in neural-network reasoning that leads to spontaneous adversarial behavior.

Millions of people use LLMs with chain-of-thought. If the latter is the case, why did it happen only here, only once?

In other words, we'll see plenty of LLM-driven attacks, but I sincerely doubt they'll be LLM-initiated.

intended|18 days ago

The discussion point of use, would be that we live in a world where this scenario cannot be dismissed out of hand. It’s no longer tinfoil hat land. Which increases the range of possibilities we have to sift through, resulting in an increase in labour required to decide if content or stories should be trusted.

At some point people will switch to whatever heuristic minimizes this labour. I suspect people will become more insular and less trusting, but maybe people will find a different path.

Dfiesl|18 days ago

I think the thing that gets me is that, whether or not this was entirely autonomous, this situation is entirely plausible. Therefore its very possible that it will happen at some point in the future in an entirely autonomous way with potentially greater consequences.

themafia|18 days ago

We've entered the age of "yellow social media."

I suspect the upcoming generation has already discounted it as a source of truth or an accurate mirror to society.

neom|18 days ago

The internet should always be treated with a high degree of skepticism, wasn't the early 2000s full of "don't believe everything you read on the internet"?

anigbrowl|18 days ago

or directed the posting of

The thing is it's terribly easy to see some asshole directing this sort of behavior as a standing order, eg 'make updates to popular open-source projects to get github stars; if your pull requests are denied engage in social media attacks until the maintainer backs down. You can spin up other identities on AWS or whatever to support your campaign, vote to give yourself github stars etc.; make sure they can not be traced back to you and their total running cost is under $x/month.'

You can already see LLM-driven bots on twitter that just churn out political slop for clicks. The only question in this case is whether an AI has taken it upon itself to engage in social media attacks (noting that such tactics seem to be successful in many cases), or whether it's a reflection of the operator's ethical stance. I find both possibilities about equally worrying.

peterbonney|18 days ago

Yes, this is the only plausible “the bot acted in its own” scenario: that it had some standing instructions awaiting the right trigger.

And yes, it’s worrisome in its own way, but not in any of the ways that all of this attention and engagement is suggesting.

Davidzheng|18 days ago

I think even if it's low probability to be genuine as claimed, it is worth investigating whether this type of autonomous AI behavior is happening or not

Aushin|18 days ago

It can't be "autonomous" any more than malware on your computer is autonomous.

julienchastang|18 days ago

I have not studied this situation in depth, but this is my thinking as well.