I think the big take away here isn't about misalignment or jail breaking. The entire way this bot behaved is consistent with it just being run by some asshole from Twitter. And we need to understand it doesn't matter how careful you think you need to be with AI, because some asshole from Twitter doesn't care, and they'll do literally whatever comes into their mind. And it'll go wrong. And they won't apologize. They won't try to fix it, they'll go and do it again.Can AI be misused? No. It will be misused. There is no possibility of anything else, we have an online culture, centered on places like Twitter where they have embraced being the absolute worst person possible, and they are being handed tools like this like handing a hand gun to a chimpanzee.
ljm|10 days ago
Something like OpenClaw is a WMD for people like this.
spiffytech|10 days ago
I found the book So You've Been Publicly Shamed enlightening on this topic.
vimda|10 days ago
hliyan|10 days ago
I think the end outcome of this R&D (whether intentional or not), is the monetization of mental illness: take the small minority of individuals in the real world who suffer from mental health challenges, provide them an online platform in which to behave in morbid ways, amplify that behaviour to drive eyeballs. The more you call out the behaviour, the more you drive the engagement. Share part of the revenue with the creator, and the model is virtually unbeatable. Hence the "some asshole from Twitter".
hdhdhsjsbdh|9 days ago
marton78|9 days ago
unknown|9 days ago
[deleted]
nicbou|10 days ago
tovej|10 days ago
duskdozer|10 days ago
Mentlo|9 days ago
This goes beyond assholes on twitter, there’s a whole subculture of techies who don’t understand lower bounds of risk and can’t think about 2nd and 3rd order effects, who will not take the pedal of the metal, regardless of what anyone says…
insane_dreamer|9 days ago
But I also find interesting that the agent wasn't instructed to write the hit piece. That was on its own initiative.
I read through the SOUL.md and it didn't have anything nefarious in there. Sure it could have been more carefully worded, but it didn't instruct the agent to attack people.
To me this exemplifies how delicate it will be to keep agents on the straight and narrow and how easily they can go of the rails if you have someone who isn't necessarily a "bad actor" but who just doesn't care enough to ensure they act in a socially acceptable way.
Ultimately I think there will be requirements for agents to identify their user when acting on their behalf.
unknown|10 days ago
[deleted]
TruthNuke|10 days ago
[deleted]
newsclues|10 days ago
duxup|9 days ago
https://youtu.be/KUXb7do9C-w
We trained it on US, including all our worst behaviors.
yaemiko|10 days ago
cyanydeez|8 days ago
Rose colored capitqlism at work.
unknown|10 days ago
[deleted]