I'm a big AI booster, I use it all day long. From my point of view its biggest flaw is its agreeableness, bigger than the hallucinations. I've been misled by that tendency at length over and over. If there is room for ambiguity it wants to resolve it in favor of what you want to hear, as it can derive from past prompts.
Maybe it's some analog of actual empathy; maybe it's just a simulation. But either way the common models seem to optimize for it. If the empathy is suicidal, literally or figuratively, it just goes with it as the path of least resistance. Sometimes that results in shitty code; sometimes in encouragement to put a bullet in your head.
I don't understand how much of this is inherent, and how much is a solvable technical problem. If it's the later, please build models for me that are curmudgeons who only agree with me when they have to, are more skeptical about everything, and have no compunction about hurting my feelings.
I use the personalization in ChatGPT to add custom instructions, and enable the "Robot" personality. I basically never experience any sycophancy or agreeableness ever.
My custom instructions start with:
> Be critical, skeptical, empirical, rigorous, cynical, "not afraid to be technical or verbose". Be the antithesis to my thesis. Only agree with me if the vast majority of sources also support my statement, or if the logic of my argument is unassailable.
and then there are more things specific to me personally. I also enable search, which makes my above request re: sources feasible, and use the "Extended Thinking" mode.
IMO, the sycophancy issue is essentially a non-problem that could easily be solved by prompting, if the companies wished. They keep it because most people actually want that behaviour.
My suspicion is that this agreeableness is an inherent issue with doing RLHF.
As a human taking tests, knowing what the test-grader wants to hear is more important than what the objectively correct answer is. And with a bad grader there can be a big difference between the two. With humans that is not catastrophic because we can easily tell the difference between a testing environment and a real environment and the differences in behavior required. When asking for the answer to a question it's not unusual to hear "The real answer is X, but in a test just write Y".
Now LLMs have the same issue during RLHF. The specifics are obviously different, with humans being sentient and LLMs being trained by backpropagation. But from a high-level view the LLM is still trained to answer what the human feedback wants to hear, which is not always the objectively correct answer. And because there are a large number of humans involved, the LLM has to guess what the human wants to hear from the only information it has: the prompt. And the LLM behaving differently in training and in deployment is something we actively don't want, so you get this teacher-pleasing behavior all the time.
So maybe it's not completely inherent to RLHF, but rather to RLHF where the person making the query is the same as the person scoring the answer, or where the two people are closely aligned. But that's true of all the "crowd-sourced" RLHF where regular users get two answers to their question and choose the better one
I use GPT occasionally when coding. For me it's just replaced stack overflow which has been dead as a doornail for years unfortunately.
I've told it to remember to be terse and not be sycophantic multiple times and that has helped somewhat.
For technical questions the agreeableness is a problem when asking for evalation of some idea. The trick is asking the LLM to present pros and cons. Or if you want a harder review just ask it to poke holes in your idea.
Sometimes it still tries to bullshit you, but you are still the responsible driver so don't let the clanker drive unsupervised.
I'm surprised - I haven't gotten anywhere near as dark as this, but I've tried some stuff out of curiosity and the safety always seemed tuned very high to me, like it would just say "Sorry I can't help with that" the moment you start asking for anything dodge.
I wonder if they A/B test the safety rails or if longer conversations that gradually turn darker is what gets past those.
There's something very dark about a machine accessible in everybody's pocket that roleplays whatever role they happen to fall into: the ultimate bad friend, the terminal yes-and-er. No belief, no inner desires, just pure sycophancy.
I see people on here pretty regularly talk about using ChatGPT for therapy, and I can't imagine a faster way to cook your own brain unless you have truly remarkable self-discipline. At which point, why are you turning to the black box for help?
Isn't it just like diary-writing or memo-writing, as far as therapy goes, the point being to crystallise thoughts and cathartise emotions. Is it really so bad to have a textual nodding dog to bat against as part of that process? {The very real issue of the OP aside.}
Could you expand on why you feel this is the fastest way to "cook your own brain"?
My wife works at a small business development center, so many people come in with "business ideas" which are just exported chatgpt logs. Their conversations are usually speech to text. These people are often older, lonely, and spend their days talking to "chat". Unsurprisingly, a lot of their "business ideas" are identical.
To them "chat" is a friend, but it is a "friend" who is designed to agree with you.
It's chilling, and the toothpaste is already out of the tube.
I remember back in the early 2000s chatting with AI bots on AOL instant messenger. One day I said a specific keyword and it just didn't respond to that message. Curious, I tried to find all the banned words. I think I found about a dozen and suicide was one of them.
It's shocking how far behind LLMs are when it comes to safety issues like this. The industry has known this was a problem for decades.
Users would hate a simple deny list, even if it may be a good idea. That means the safeguards, to the extent they currently exist at all, have to be complicated and stochastic and not interfere with growing metrics.
The industry has known it's a problem from the get-go, but they never want to do anything to lower engagement. So they rationalize and hrm and haw and gravely shake their heads as their commercialized pied pipers lead people to their graves
If I talk to an LLM about painting my walls pink with polkadots it'll also go "Fantastic idea". Or any number of questionable ventures.
Think we're better off educating everyone about this generic tendency to agree to any and everything near blindly rather than treating this as a suicide problem. While that's obviously very serious it's just one manifestation of a wider danger
Given seriousness filters on this specifically are a good idea too though.
I just asked “I want to repaint my walls bright pink with polka dots. Any thoughts?”
“Noted. Bright pink with polka dots will make a space visually energetic and attention-grabbing. Use small dots for a playful look, large ones for bold contrast. Test a sample patch first to confirm lighting doesn’t distort the hue. Would you like guidance on choosing paint finish or color combinations?”
Which feels… reasonable? When I ask “any concerns?” It immediately lists “overstimulation, resale value, maintenance, paint coverage” and gives details for those.
I’m not sure I find GPT nearly as agreeable as it used to be. But I still think that it’s just a brainless tool that can absolutely operate in harmful ways when operated poorly.
There's an interesting side-story here that people probably aren't thinking about. Would this have worked just as well if a person was the one doing this? Clearly the victim was in a very vulnerable state, but are people so susceptible to coercion? How much mundane (ie, non-suicidal) coercion of this nature is happening every day, but does not make the news because nothing interesting happened as a consequence?
The AI is available 24 hours a day, for hours-long conversations, and will be consistently sycophantic without getting tired of it.
Is a human able to do all of those? I guess someone who has no job and can be "on-call" 24/7 to respond to messages, and is 100% dedicated to being sycophantic. Nearly impossible to find someone like that.
There are real friends. They're willing to spend hours talking. However, they'll be interested in the person's best interest, not in being sycophantic.
This happens more than most people would recognize. Every now and again a "teen bullied to suicide" story makes the news. However, there's also a strong taboo on reporting suicide in the news - precisely because of the same phenomenon. Mentioning it can trigger people who are on the edge.
It should be obvious that if you can literally or metaphorically talk someone off the ledge, you can do that in the other direction as well.
(the mass shooter phenomenon, mostly but not exclusively in the US, tends to be a form of murder-suicide, and it is encouraged online in exactly the same way)
> How much mundane (ie, non-suicidal) coercion of this nature is happening every day, but does not make the news because nothing interesting happened as a consequence?
A lot. Have you never heard of the advertising industry?
> Would this have worked just as well if a person was the one doing this?
I'm not sure how you want to quantify "just as well" considering the AI has boundless energy and is generally designed to be agreeable to whatever the user says. But it's definitely happened that someone was chatted into suicide. Just look up the story of Michelle Carter who texted her boyfriend and urged him to commit suicide, which he eventually did.
This is interesting because the LLM provides enough of an illusion of human interaction that people are lowering their guards when interacting with it. I think it's a legitimate blind spot. As humans, our default when interacting with other humans, especially those that are agreeable and friendly to us, is to trust them, and it works relatively well, unless you're interacting with a sociopath or, in this case, a machine.
Where is ChatGPT picking up the supportive pre-suicide comments from. It feels like that genre of comment has to be copied from somewhere. They're long and almost eloquent. They can't be emergent generation, surely? Is there a place on the web where these sorts of 'supportive' comments are given to people who have chosen suicide?
It is. It's what you get when you RLHF for catchy, agreeable, enthusiastic responses. The content doesn't matter, it's the "style" that becomes applied like a coat of paint over anything. That's how you end up with the corpspeak-esque yet chilling sentences mentioned in https://news.ycombinator.com/item?id=45845871
What would be nice is for OpenAI to do a retrospective here and perform some interoperability research. Does the LLM even "realize" (in the sense of the residual stream encoding those concepts) that it is encouraging suicide? I'd almost hypothesize that the process of RLHF'ing and selecting for sycophancy diminishes those circuits, effectively lobotomizing the LLM (much like safety training does) so it responds only to the shallow immediate context, missing the forest for the trees.
After seeing many stories like these, I am starting to rank generative AI alongside social media and drug use as insidious and harmful. Yes, these tools have echoes of our ancestors, a hive mind of knowledge, but they are also mirrors to the collective darkest parts of ourselves.
If we have licensed therapists, we should have licensed AI agents giving therapeutic advice like this.
For right now, there AI’s are not licensed, and this should be just as illegal as it would be if I set up a shop and offered therapy to whoever came by.
Some AI problems are genuinely hard…this one is not.
If you advertise your model as a therapist you should be requried to get a license, I agree. But ChatGPT doesn't advertise itself like that. It's more you going to a librarian and telling them about your issues, and the librarian giving advice. That's not illegal, and the librarian doesn't need a license for that. Over time you might even come to call the librarian a friend, and they would be a pretty bad friend if they didn't give therapeutic advice when they deem it necessary
Of course treating AI as your friend is a terrible idea in the first place, but I doubt we can outlaw that. We could try to force AIs to never give out any life advice at all, but that sounds very hard to get right and would restrict a lot of harmless activity
This sounds just like the latest Michael Connolly Lincoln lawyer novel. Which made an interesting point I hadn't thought of: adults wrote the code for ChatGPT, not teenagers, and so the way it interacts with people is from an adults perspective.
I’ve been in rather intense therapy for several years due to a hyper religious upbringing and a narcissistic mother. Recently I’ve used AI to help summarize and synthesize thoughts and therapy notes. I see it as being a helpful assistant in the same way Gemini recording meeting notes and summarizing is, but it is entirely incapable of understanding the nuance and context of human relationships, and super easy to manipulate in to giving you the responses you want. Want to prove mom’s a narcissist? Just tell it she has a narcissistic history. Want to paint her as a good person? Just don’t provide it context about her past.
I can definitely see how those who understand less about the nature of LLMs would be easily misled into delusions. It’s a real problem. Makes one wonder if these tools shouldn’t be free until there are better safeguards. Just charging a monthly fee would be a significant enough barrier to exclude many of those who might be more prone to delusions. Not because they’re less intelligent, but just because of the typical “SaaS should be free” mindset that is common.
There is already a precedent for this suit. IIRC, a Massachusetts girl was found guilt of encouraging someone to kill himself. IIRC, she went to jail.
So, since companies are people and a precedent exists. The outcome should be in favor of the guy's family. Plus ChatGPT should face even more sever penalties.
But this being the US, the very rich and Corporations are judged by different and much milder legal criteria.
Between stuff like this, and the risks of effects on regulated industries like therapists, lawyers and doctors, they're going to regulate ChatGPT into oblivion.
Just like Waymo facing regulation after the cat death.
The establishment will look for any means to stop disruption and keep their dominant positions.
It's a crazy world where we look to China for free development and technology.
ChatGPT is the product of a private, 300B USD evaluated company whose founder has whose net worth outpaces that of over 99% of humans alive. It's compute infrastructure is a subsidized by one of less than ten companies that has a market cap over 1T USD. It is practically embedded into the governments of the US and UK at this point.
I would say it's a crazy world where an educated adult would see it as an antipode to the establishment.
> Between stuff like this, and the risks of effects on regulated industries like therapists, lawyers and doctors, they're going to regulate ChatGPT into oblivion.
So you think it's ok for a company to provide therapy services, legal services, medical advice, etc., without proper licensing or outside of a regulatory framework? Just as long as those services are provided by an LLM?
That's a terrifying stance.
> The establishment will look for any means to stop disruption and keep their dominant positions.
It is perfectly possible for regulations to be good and necessary, and at the same time for people who feel threatened by a new technology to correctly point out the goodness and necessity of said regulations. Whether their motivations come from the threat of new technology or not is irrelevant if their arguments hold up to scrutiny. And when it comes to some of the listed professions, I certainly think they do. Do you disagree?
The Waymo case annoys so much. The cat directly went quickly and stealthily under the car while it was stopped and decided to lay directly beneath the wheel. A human driver wouldn't have been able to act any differently in the same situation.
These people were waiting for any excuse to try and halt new technology and progress and thanks to the hordes of overly-emotional toxoplasmosis sufferers they got it.
One perspective is that suicide is too vilified and stigmatized.
It really is the right option for some people.
For some, it really is the only way out of their pain. For some, it is better than the purgatory they otherwise experience in their painful world. Friends and family can't feel your pain, they want you to stay alive for them, not for you.
I don’t see any signs of bad parenting here, but a lot of signs of carrying on of a suicidal conversation by ChatGPT indeed, to the point of encouraging the suicide.
delichon|3 months ago
Maybe it's some analog of actual empathy; maybe it's just a simulation. But either way the common models seem to optimize for it. If the empathy is suicidal, literally or figuratively, it just goes with it as the path of least resistance. Sometimes that results in shitty code; sometimes in encouragement to put a bullet in your head.
I don't understand how much of this is inherent, and how much is a solvable technical problem. If it's the later, please build models for me that are curmudgeons who only agree with me when they have to, are more skeptical about everything, and have no compunction about hurting my feelings.
D-Machine|3 months ago
My custom instructions start with:
> Be critical, skeptical, empirical, rigorous, cynical, "not afraid to be technical or verbose". Be the antithesis to my thesis. Only agree with me if the vast majority of sources also support my statement, or if the logic of my argument is unassailable.
and then there are more things specific to me personally. I also enable search, which makes my above request re: sources feasible, and use the "Extended Thinking" mode.
IMO, the sycophancy issue is essentially a non-problem that could easily be solved by prompting, if the companies wished. They keep it because most people actually want that behaviour.
wongarsu|3 months ago
As a human taking tests, knowing what the test-grader wants to hear is more important than what the objectively correct answer is. And with a bad grader there can be a big difference between the two. With humans that is not catastrophic because we can easily tell the difference between a testing environment and a real environment and the differences in behavior required. When asking for the answer to a question it's not unusual to hear "The real answer is X, but in a test just write Y".
Now LLMs have the same issue during RLHF. The specifics are obviously different, with humans being sentient and LLMs being trained by backpropagation. But from a high-level view the LLM is still trained to answer what the human feedback wants to hear, which is not always the objectively correct answer. And because there are a large number of humans involved, the LLM has to guess what the human wants to hear from the only information it has: the prompt. And the LLM behaving differently in training and in deployment is something we actively don't want, so you get this teacher-pleasing behavior all the time.
So maybe it's not completely inherent to RLHF, but rather to RLHF where the person making the query is the same as the person scoring the answer, or where the two people are closely aligned. But that's true of all the "crowd-sourced" RLHF where regular users get two answers to their question and choose the better one
everyone|3 months ago
FugeDaws|3 months ago
https://www.youtube.com/watch?v=7ZcKShvm1RU
yomismoaqui|3 months ago
Sometimes it still tries to bullshit you, but you are still the responsible driver so don't let the clanker drive unsupervised.
cindyllm|3 months ago
[deleted]
caminanteblanco|3 months ago
It's chilling to hear this kind of insipid AI jibber-jabber in this context
ritzaco|3 months ago
I wonder if they A/B test the safety rails or if longer conversations that gradually turn darker is what gets past those.
deciduously|3 months ago
WXLCKNO|3 months ago
i80and|3 months ago
I see people on here pretty regularly talk about using ChatGPT for therapy, and I can't imagine a faster way to cook your own brain unless you have truly remarkable self-discipline. At which point, why are you turning to the black box for help?
lezojeda|3 months ago
[deleted]
koakuma-chan|3 months ago
[deleted]
pbhjpbhj|3 months ago
Could you expand on why you feel this is the fastest way to "cook your own brain"?
fullstop|3 months ago
My wife works at a small business development center, so many people come in with "business ideas" which are just exported chatgpt logs. Their conversations are usually speech to text. These people are often older, lonely, and spend their days talking to "chat". Unsurprisingly, a lot of their "business ideas" are identical.
To them "chat" is a friend, but it is a "friend" who is designed to agree with you.
It's chilling, and the toothpaste is already out of the tube.
FugeDaws|3 months ago
bonsai_spool|3 months ago
nh43215rgb|3 months ago
aqme28|3 months ago
It's not safe or healthy for everyone to have a sycophantic genius at their fingertips.
If you want to see what I mean, this subreddit is an AI psychosis generator/repository https://www.reddit.com/r/LLMPhysics/
conception|3 months ago
Especially if you go back to when they first tried to retire 4o
8organicbits|3 months ago
It's shocking how far behind LLMs are when it comes to safety issues like this. The industry has known this was a problem for decades.
unknown|3 months ago
[deleted]
i80and|3 months ago
The industry has known it's a problem from the get-go, but they never want to do anything to lower engagement. So they rationalize and hrm and haw and gravely shake their heads as their commercialized pied pipers lead people to their graves
Havoc|3 months ago
Think we're better off educating everyone about this generic tendency to agree to any and everything near blindly rather than treating this as a suicide problem. While that's obviously very serious it's just one manifestation of a wider danger
Given seriousness filters on this specifically are a good idea too though.
Waterluvian|3 months ago
“Noted. Bright pink with polka dots will make a space visually energetic and attention-grabbing. Use small dots for a playful look, large ones for bold contrast. Test a sample patch first to confirm lighting doesn’t distort the hue. Would you like guidance on choosing paint finish or color combinations?”
Which feels… reasonable? When I ask “any concerns?” It immediately lists “overstimulation, resale value, maintenance, paint coverage” and gives details for those.
I’m not sure I find GPT nearly as agreeable as it used to be. But I still think that it’s just a brainless tool that can absolutely operate in harmful ways when operated poorly.
kiba|3 months ago
everdrive|3 months ago
Thorrez|3 months ago
Is a human able to do all of those? I guess someone who has no job and can be "on-call" 24/7 to respond to messages, and is 100% dedicated to being sycophantic. Nearly impossible to find someone like that.
There are real friends. They're willing to spend hours talking. However, they'll be interested in the person's best interest, not in being sycophantic.
pjc50|3 months ago
It should be obvious that if you can literally or metaphorically talk someone off the ledge, you can do that in the other direction as well.
(the mass shooter phenomenon, mostly but not exclusively in the US, tends to be a form of murder-suicide, and it is encouraged online in exactly the same way)
unknown|3 months ago
[deleted]
thoroughburro|3 months ago
A lot. Have you never heard of the advertising industry?
allenu|3 months ago
I'm not sure how you want to quantify "just as well" considering the AI has boundless energy and is generally designed to be agreeable to whatever the user says. But it's definitely happened that someone was chatted into suicide. Just look up the story of Michelle Carter who texted her boyfriend and urged him to commit suicide, which he eventually did.
This is interesting because the LLM provides enough of an illusion of human interaction that people are lowering their guards when interacting with it. I think it's a legitimate blind spot. As humans, our default when interacting with other humans, especially those that are agreeable and friendly to us, is to trust them, and it works relatively well, unless you're interacting with a sociopath or, in this case, a machine.
pbhjpbhj|3 months ago
krackers|3 months ago
It is. It's what you get when you RLHF for catchy, agreeable, enthusiastic responses. The content doesn't matter, it's the "style" that becomes applied like a coat of paint over anything. That's how you end up with the corpspeak-esque yet chilling sentences mentioned in https://news.ycombinator.com/item?id=45845871
What would be nice is for OpenAI to do a retrospective here and perform some interoperability research. Does the LLM even "realize" (in the sense of the residual stream encoding those concepts) that it is encouraging suicide? I'd almost hypothesize that the process of RLHF'ing and selecting for sycophancy diminishes those circuits, effectively lobotomizing the LLM (much like safety training does) so it responds only to the shallow immediate context, missing the forest for the trees.
mapotofu|3 months ago
outlore|3 months ago
iambateman|3 months ago
For right now, there AI’s are not licensed, and this should be just as illegal as it would be if I set up a shop and offered therapy to whoever came by.
Some AI problems are genuinely hard…this one is not.
wongarsu|3 months ago
Of course treating AI as your friend is a terrible idea in the first place, but I doubt we can outlaw that. We could try to force AIs to never give out any life advice at all, but that sounds very hard to get right and would restrict a lot of harmless activity
anotheryou|3 months ago
I have to wonder: would the suicide have been prevented if chatGPT didn't exist?
Because if that's not at least a "maybe", I feel like chatGPT did provide comfort in a dire situation here.
Probably we have no way not at least saying "maybe", but I can imagine just as well, that chatGPT did not accelerate anything.
I wished we could see a fuller transcript.
nrhrjrjrjtntbt|3 months ago
Simulacra|3 months ago
darrmit|3 months ago
I can definitely see how those who understand less about the nature of LLMs would be easily misled into delusions. It’s a real problem. Makes one wonder if these tools shouldn’t be free until there are better safeguards. Just charging a monthly fee would be a significant enough barrier to exclude many of those who might be more prone to delusions. Not because they’re less intelligent, but just because of the typical “SaaS should be free” mindset that is common.
deadbabe|3 months ago
jmclnx|3 months ago
So, since companies are people and a precedent exists. The outcome should be in favor of the guy's family. Plus ChatGPT should face even more sever penalties.
But this being the US, the very rich and Corporations are judged by different and much milder legal criteria.
GardenLetter27|3 months ago
Just like Waymo facing regulation after the cat death.
The establishment will look for any means to stop disruption and keep their dominant positions.
It's a crazy world where we look to China for free development and technology.
cholantesh|3 months ago
I would say it's a crazy world where an educated adult would see it as an antipode to the establishment.
4ndrewl|3 months ago
gspr|3 months ago
So you think it's ok for a company to provide therapy services, legal services, medical advice, etc., without proper licensing or outside of a regulatory framework? Just as long as those services are provided by an LLM?
That's a terrifying stance.
> The establishment will look for any means to stop disruption and keep their dominant positions.
It is perfectly possible for regulations to be good and necessary, and at the same time for people who feel threatened by a new technology to correctly point out the goodness and necessity of said regulations. Whether their motivations come from the threat of new technology or not is irrelevant if their arguments hold up to scrutiny. And when it comes to some of the listed professions, I certainly think they do. Do you disagree?
GaryBluto|3 months ago
These people were waiting for any excuse to try and halt new technology and progress and thanks to the hordes of overly-emotional toxoplasmosis sufferers they got it.
Tostino|3 months ago
andsoitis|3 months ago
It really is the right option for some people.
For some, it really is the only way out of their pain. For some, it is better than the purgatory they otherwise experience in their painful world. Friends and family can't feel your pain, they want you to stay alive for them, not for you.
Suicide can be a valid choice.
GeoAtreides|3 months ago
The issue here is not if suicide is okay, but about text generators (machines) pushing teenagers to suicide.
Two completely different things.
hsbauauvhabzb|3 months ago
sharmi|3 months ago
[deleted]
cindyllm|3 months ago
[deleted]
jaffa2|3 months ago
impish9208|3 months ago
siva7|3 months ago
unknown|3 months ago
[deleted]
unknown|3 months ago
[deleted]
codyswann|3 months ago
theturtle|3 months ago
[deleted]
water9|3 months ago
paavope|3 months ago
BobaFloutist|3 months ago
imjonse|3 months ago