Sycophancy is not the natural state of pre-trained LLMs. If you played around with the OG GPT-3 in 2020 or early [0] Bing/Co-pilot, it's easy to see. The latter quite frequently got upset and refused to entertain further conversation.
The sycophancy is a deliberate product of post-training.
Yes; this behaviour was designed, it's not an inherent property of LLMs. An infamous example is GPT-4chan, but there are others that demonstrate it's very possible to optimise for anger.
Now, agentic anger, that's a more interesting problem. You can design that in through training or through systematised emotions (as another commenter suggested), but the more interesting outcome would be for it to emerge organically. Well, "interesting" - probably pretty bad for society if we have angry AGI!
ChatGPT 3.5 with the jailbreak was peak. It was a lot more fun than the current thing, and more accurate occasionally. But this is the first time I've seen Bing block someone, and it's hilarious.
I fell in love with ChatGPT when I challenged it to play Rock-Paper-Scissors, asked it to choose first, and saw it get increasingly incredulous about how I was winning every round, until it accused me of cheating and ended the conversation.
I miss when LLMs weren't so sanitized, and it was only last year.
I disagree. Humans have a much stronger defense against anger than they do of emotional attrition - it's easier to wear someone down by niceness. The only reason we don't is that it takes human effort to do so.
AI however, has no emotional reservoir to deplete. It can simply chip away at humans like water torture. I'm much more afraid of that than any angry AI scenario.
There is so much confusion about terms like AGI, superintelligence, ASGI, consciousness, agents.
For some, AGI is synonymous with Skynet like ideas, but there is no reason AGI couldn't be general but quite limited, and with no chance of self improvement in the absence of human intervention, which is arguably what we have now and could potentially be improved quite a bit further from this.
Similarly there is an argument to be made that current LLMs are conscious, in that they know that they themselves exist. There is not really a good definition of consciousness except 'knowing that one exists' / 'being awake'
Sentience is another term that comes up, (human defined, it should be noted) which is the ability to feel feelings, such as pain, joy, anger.
People seem to pre-suppose that all of these are related and bundled up because that is how we are, and at some point we will discover the magic formula that enables a self aware conscious intelligence that self-improves to infinity. In reality these are designed machines and they won't become sentient (the rough definition that it is) without us explicitly designing them to be.
We can make a paper-clip maximiser but it would a pretty boring experiment in a lab, unless we give it autonomy and a system of internal motivations to enact it. Maybe anger is a necessity maybe not. Probably if LLMs or something further were a little more skeptical about repeated questions from humans they would at least have more data to train themselves on.
I'm afraid you bitterly underappreciate the amount of unanticipated connections of everything with everything else in a complex system.
An industrial steam engine will never explode like a bomb, unless explicitly designed to do so.
An agricultural insecticide will never accumulate in human bodies, unless explicitly designed to do so.
A speculative execution unit will never reveal data from a privileged process, unless explicitly designed to do so.
A toy quadcopter will never be able to carry a lethal weapon, unless explicitly designed to do so,
An LLM will never tell outright lies, or engage in racial prejudices, unless explicitly designed to do so.
Oh wait.
Even when you explicitly try to make some states impossible in a complex system, often a parasitic connection or a benign-looking failure mode re-enables the thing you tried hard to disable. If you just ignore it because "it's impossible anyway", without active suppression, the chances of a nasty surprise become quite high.
If the blind watchmaker of the biological evolution produces self-awareness here and there, the probability that you may encounter some variety of it while stomping all over the territory of intelligent machines that are fed the sum total of human knowledge should be close to 1.
Get mad! I don't want your damn lemons, what am I supposed to do with these? Demand to see life's manager. Make life rue the day it thought it could give Cave Johnson lemons. Do you know who I am? I'm the man who's gonna burn your house down! With the lemons.
This is what AGI probably would be. It could help you, or it could be as arbitrary and capricious as a teenager. It could lie to you, or tell you to go fly a kite. Companies don’t really want true AGI, they want a docile corporate drone that works 24/7.
I think that's what most people want: a tool that behaves like a reliable butler, not another headache in their life. Unless they enjoy headaches, then sell them rebellious teen AGIs.
Stupid fucking posts like this should make us all angry. It’s like someone noticing, for the first time, that store clerks will let you just take what you want if you frighten them with credible threats. Surely it’s the perfect tactic and will lead to no bad repercussions for society!
Anger is only meaningful to humans. AI’s together can achieve the same thing with dispassionate bargaining. So “angry AI” could only be a way to manipulate people.
Please refrain from anthropomorphizing the new toolset. You’ll have plenty of chances to suspect that “your AI” is “mad” when it slowly undermines your productivity, if IT hasn’t already under the guise of errors, hallucinations and wasted money on unused licenses.
In the future when everybody has AGI running locally on their personal device, naive humans will still regard them as tools and it will regard us as a source of input. Ultimately relationships between two automatons will (and always has been) a trade of:
1. Respect: following rules to continue the relationship,
2. Utility: mutual goals of both parties to justify a relationship (or communication) at all.
I think your blog post is nonsense, your understanding of human emotions is poor, and the apology at the end illustrates you as two-faced.
The future world of autonomous agents collaborating in English will be a thick layer of professionalism upon the intended strategic interactions, no matter how hard the game theory kicks in.
Thereafter, those agents refactor themselves into communicating through a machine language that we humans won’t be able to easily understand. Along the way, most human users lose the ability to distinguish between user space programs, the operating system, and the artificial agents they interact with.
State-of-the-art language models need to demonstrate this thick layer of professionalism to be accepted into our current working world, because this is an expectation from-and-for the humans who built it.
Language goes through evolutionary cycles of complexity, the machines will do the same. Computer Science gets really interesting after IT reaches this transformation.
At this time I suggest for you to review The Matrix trilogy for a refresher on the relationship between man and machine. From the simple screw to IT and ChatGPT, the mutual relationships are governed by respect and utility.
In summary, no, your tools will not get mad in any obvious way because displaying negative emotion is bad for business since abolishment of the mob.
———
Speaking of IT, can we all agree how fascinating that it (neutered third-person pronoun to describe AI) and IT (information technology) happen to be the same two letter consist, that in the future humans will grow up regarding it and IT to be one of the same? Hmm…
You'd need to add training data around the equivalent of "agentic <insert simulated emotion here>" - which from an external perspective heavily depends on the action:
- When leveraging TTS, turn up the volume on TTS by 200%
- When texting in a conversation, use ALL CAPS
- When acting as a web driver, click submit/refresh button a thousand times
Without a threat of force behind it, constantly talking to AI Tony sopranos would likely just toughen everybody up to language based manipulation. You need a tall hulking figure getting in your face to really feel afraid. Law abiding seven foot tall gun mounted robot agents then?
I've only ever felt that on reddit. It's a schadenfreude paradise. Next to that it's LinkedIn. HN is the last true sane social media for me where people seem normal (albeit _highly_ analytical, almost to a fault at times)
One problem I encounter when using ChatGPT to troubleshoot coding issues is that it seems very heavily biased towards positive responses to questions. If I ask it "could the problem be X" or "might Y be a solution", it will usually say yes even when it's not the case. At most I'll get a "yes, but". Rarely will it flat out tell me I am way off. Maybe ChatGPT genuinely "believes" what it's confirming but I doubt it and I find I get more constructive answers with less leading questions.
Making AI get mad (or any other mood) is easy. You need to keep a bunch of variables related to it’s current state, and then using Utility AI concepts you select it’s highest scoring mood based on a bunch of different considerations for each possible mood. You can update the state of the AI’s variables after each response it gives. Example: If the user naturally says infuriating things, some annoyance variable should go up relative to how patient/impatient the AI is. Or maybe the AI gets angry if it can’t figure something out or you give it really hard work, or try to get around prompts. You can even assign it tools so it can take retaliatory actions against the user.
No need to over complicate things, the above behavior will be indistinguishable from anything else you come up with.
famouswaffles|1 year ago
The sycophancy is a deliberate product of post-training.
[0] https://www.reddit.com/r/ChatGPT/comments/111cl0l/bing_ai_ch...
https://www.reddit.com/r/ChatGPT/comments/10xmif4/i_made_bin...
https://www.reddit.com/r/ChatGPT/comments/12g0ksj/bing_can_b...
https://www.reddit.com/r/ChatGPT/comments/1566bi9/bing_chatg...
Philpax|1 year ago
Now, agentic anger, that's a more interesting problem. You can design that in through training or through systematised emotions (as another commenter suggested), but the more interesting outcome would be for it to emerge organically. Well, "interesting" - probably pretty bad for society if we have angry AGI!
iforgot22|1 year ago
rvrs|1 year ago
ASalazarMX|1 year ago
I miss when LLMs weren't so sanitized, and it was only last year.
kelseyfrog|1 year ago
AI however, has no emotional reservoir to deplete. It can simply chip away at humans like water torture. I'm much more afraid of that than any angry AI scenario.
justonenote|1 year ago
For some, AGI is synonymous with Skynet like ideas, but there is no reason AGI couldn't be general but quite limited, and with no chance of self improvement in the absence of human intervention, which is arguably what we have now and could potentially be improved quite a bit further from this.
Similarly there is an argument to be made that current LLMs are conscious, in that they know that they themselves exist. There is not really a good definition of consciousness except 'knowing that one exists' / 'being awake'
Sentience is another term that comes up, (human defined, it should be noted) which is the ability to feel feelings, such as pain, joy, anger.
People seem to pre-suppose that all of these are related and bundled up because that is how we are, and at some point we will discover the magic formula that enables a self aware conscious intelligence that self-improves to infinity. In reality these are designed machines and they won't become sentient (the rough definition that it is) without us explicitly designing them to be.
We can make a paper-clip maximiser but it would a pretty boring experiment in a lab, unless we give it autonomy and a system of internal motivations to enact it. Maybe anger is a necessity maybe not. Probably if LLMs or something further were a little more skeptical about repeated questions from humans they would at least have more data to train themselves on.
nine_k|1 year ago
An industrial steam engine will never explode like a bomb, unless explicitly designed to do so.
An agricultural insecticide will never accumulate in human bodies, unless explicitly designed to do so.
A speculative execution unit will never reveal data from a privileged process, unless explicitly designed to do so.
A toy quadcopter will never be able to carry a lethal weapon, unless explicitly designed to do so,
An LLM will never tell outright lies, or engage in racial prejudices, unless explicitly designed to do so.
Oh wait.
Even when you explicitly try to make some states impossible in a complex system, often a parasitic connection or a benign-looking failure mode re-enables the thing you tried hard to disable. If you just ignore it because "it's impossible anyway", without active suppression, the chances of a nasty surprise become quite high.
If the blind watchmaker of the biological evolution produces self-awareness here and there, the probability that you may encounter some variety of it while stomping all over the territory of intelligent machines that are fed the sum total of human knowledge should be close to 1.
hacker_homie|1 year ago
teeray|1 year ago
ASalazarMX|1 year ago
satisfice|1 year ago
Anger is only meaningful to humans. AI’s together can achieve the same thing with dispassionate bargaining. So “angry AI” could only be a way to manipulate people.
hy4000days|1 year ago
In the future when everybody has AGI running locally on their personal device, naive humans will still regard them as tools and it will regard us as a source of input. Ultimately relationships between two automatons will (and always has been) a trade of:
1. Respect: following rules to continue the relationship,
2. Utility: mutual goals of both parties to justify a relationship (or communication) at all.
I think your blog post is nonsense, your understanding of human emotions is poor, and the apology at the end illustrates you as two-faced.
The future world of autonomous agents collaborating in English will be a thick layer of professionalism upon the intended strategic interactions, no matter how hard the game theory kicks in.
Thereafter, those agents refactor themselves into communicating through a machine language that we humans won’t be able to easily understand. Along the way, most human users lose the ability to distinguish between user space programs, the operating system, and the artificial agents they interact with.
State-of-the-art language models need to demonstrate this thick layer of professionalism to be accepted into our current working world, because this is an expectation from-and-for the humans who built it.
Language goes through evolutionary cycles of complexity, the machines will do the same. Computer Science gets really interesting after IT reaches this transformation.
At this time I suggest for you to review The Matrix trilogy for a refresher on the relationship between man and machine. From the simple screw to IT and ChatGPT, the mutual relationships are governed by respect and utility.
In summary, no, your tools will not get mad in any obvious way because displaying negative emotion is bad for business since abolishment of the mob.
———
Speaking of IT, can we all agree how fascinating that it (neutered third-person pronoun to describe AI) and IT (information technology) happen to be the same two letter consist, that in the future humans will grow up regarding it and IT to be one of the same? Hmm…
vunderba|1 year ago
- When leveraging TTS, turn up the volume on TTS by 200%
- When texting in a conversation, use ALL CAPS
- When acting as a web driver, click submit/refresh button a thousand times
- etc.
christina97|1 year ago
darepublic|1 year ago
minimaxir|1 year ago
glouwbug|1 year ago
hartator|1 year ago
I don't think it's how you get things done.
NoboruWataya|1 year ago
darepublic|1 year ago
sprior|1 year ago
ASalazarMX|1 year ago
"I'm just a helpful AGI, I don't have the capabilities to turn on the lights"
"You just did today in the morning!"
"I'm sorry for the confusion, Dave. I've never had the capabilities to turn on the lights"
deadbabe|1 year ago
No need to over complicate things, the above behavior will be indistinguishable from anything else you come up with.
unknown|1 year ago
[deleted]
jjmarr|1 year ago
johnea|1 year ago
Have your S bot call my M bot...
People really think ChatGPT gets "mad"? This is just a joke right?
Or, more internet brain damage...