top | item 42524481

(no title)

I suspect the big AI companies try to adversarially train that out as it could be used to "jailbreak" their AI.

I wonder though, what would be considered a meaningful punishment/reward to an AI agent? More/less training compute? Web search rate limits? That assumes that what the AI "wants" is to increase its own intelligence.

discuss

No comments yet.