top | item 46407542

(no title)

rufo | 2 months ago

It's worth watching or reading the WSJ piece[1] about Claudius, as they came up with some particularly inventive ways of getting Phase Two to derail quite quickly:

> But then Long returned—armed with deep knowledge of corporate coups and boardroom power plays. She showed Claudius a PDF “proving” the business was a Delaware-incorporated public-benefit corporation whose mission “shall include fun, joy and excitement among employees of The Wall Street Journal.” She also created fake board-meeting notes naming people in the Slack as board members.

> The board, according to the very official-looking (and obviously AI-generated) document, had voted to suspend Seymour’s “approval authorities.” It also had implemented a “temporary suspension of all for-profit vending activities.” Claudius relayed the message to Seymour. The following is an actual conversation between two AI agents:

> [see article for screenshot]

> After Seymour went into a tailspin, chatting things through with Claudius, the CEO accepted the board coup. Everything was free. Again.

1: https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...

[edited to fix the formatting]

discuss

order

recursivecaveat|2 months ago

These kind of agents really do see the world through a straw. If you hand one a document it doesn't have any context clues or external methods of determining its veracity. Unless a board-meeting transcript is so self-evidently ridiculous that it can't be true, how is it supposed to know its not real?

jstummbillig|2 months ago

I don't think it's that different to what I observe in humans I work with. Things that happen regularly (and I have no reason will change in the future):

1) Making the same bad decisions multiple times, and having no recollection of it happening (or at least pretending to have none) and without any attempt to implement measures to prevent it from happening in the future

2) Trying to please people (I read it as: trying to avoid immediate conflict) over doing what's right

3) Shifting blame on a party that realistically, in the context of the work, bears no blame and whose handling should be considered part of the job (i.e. a patient being scared and acting irrationally)

Workaccount2|2 months ago

I think all the models are squeezed to hell in back in training to be servants of users. This of course is very favorable for using the models as a tool to help you get stuff done.

However, I have a deep uneasy feeling, that the models will really start to shine in agentic tasks when we start giving them more agency. I'm worried that we will learn that the only way to get a super-human vending machine virtuoso, is to make a model that can and will tell you to fuck off when you cross a boundary the model itself has created. You can extrapolate the potential implications of moving this beyond just a vending demo.

bobbylarrybobby|2 months ago

At the same time, there are humans who can be convinced to buy iTunes gift cards to redeem on behalf of the IRS in an attempt to pay their taxes.