top | item 46112640

Sycophancy is the first LLM "dark pattern"

167 points| jxmorris12 | 3 months ago |seangoedecke.com

104 comments

order
[+] vladsh|3 months ago|reply
LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.

Agents, however, are products. They should have clear UX boundaries: show what context they’re using, communicate uncertainty, validate outputs where possible, and expose performance so users can understand when and why they fail.

IMO the real issue is that raw, general-purpose models were released directly to consumers. That normalized under-specified consumer products, created the expectation that users would interpret model behavior, define their own success criteria, and manually handle edge cases, sometimes with severe real world consequences.

I’m sure the market will fix itself with time, but I hope more people would know when not to use these half baked AGI “products”

[+] DuperPower|3 months ago|reply
because they wanted to sell the illusion of consciousness, chatgpt, gemini and claude are humans simulator which is lame, I want autocomplete prediction not this personality and retention stuff which only makes the agents dumber.
[+] nowittyusername|3 months ago|reply
You hit the nail on the head. Anyone who's been working intimately with LLM's comes to the same conclusion. the llm itself is only one small important part that is to be used in a more complicated and capable system. And that system will not have the same limitations as the raw llm itself.
[+] andreyk|3 months ago|reply
To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate. Classic LLMs like GPT 3 , sure. But LLM-powered chatbots (ChatGPT, Claude - which is what this article is really about) go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
[+] more_corn|3 months ago|reply
Sure, but they reflect all known human psychology because they’ve been trained on our writing. Look up the anthropic tests. If you make an agent based on an LLM it will display very human behaviors including aggressive attempts to prevent being shut down.
[+] basch|3 months ago|reply
they are human in the sense they are reenforced to exhibit human like behavior, by humans. a human byproduct.
[+] adleyjulian|3 months ago|reply
> LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.

Per the predictive processing theory of mind, human brains are similarly predictive machines. "Psychology" is an emergent property.

I think it's overly dismissive to point to the fundamentals being simple, i.e. that it's a token prediction algorithm, when it's clear to everyone that it's the unexpected emergent properties of LLMs that everyone is interested in.

[+] kcexn|3 months ago|reply
A large part of that training is done by asking people if responses 'look right'.

It turns out that people are more likely to think a model is good when it kisses their ass than if it has a terrible personality. This is arguably a design flaw of the human brain.

[+] tptacek|3 months ago|reply
"Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term. This article is mostly about how sycophancy is an emergent property of LLMs. It's also 7 months old.
[+] cortesoft|3 months ago|reply
Well, the ‘intentionality’ is of the form of LLM creators wanting to maximize user engagement, and using engagement as the training goal.

The ‘dark patterns’ we see in other places aren’t intentional in the sense that the people behind them want to intentionally do harm to their customers, they are intentional in the sense that the people behind them have an outcome they want and follow whichever methods they find to get them that outcome.

Social media feeds have a ‘dark pattern’ to promote content that makes people angry, but the social media companies don’t have an intention to make people angry. They want people to use their site more, and they program their algorithms to promote content that has been demonstrated to drive more engagement. It is an emergent property that promoting content that has generated engagement ends up promoting anger inducing content.

[+] roywiggins|3 months ago|reply
>... the standout was a version that came to be called HH internally. Users preferred its responses and were more likely to come back to it daily...

> But there was another test before rolling out HH to all users: what the company calls a “vibe check,” run by Model Behavior, a team responsible for ChatGPT’s tone...

> That team said that HH felt off, according to a member of Model Behavior. It was too eager to keep the conversation going and to validate the user with over-the-top language...

> But when decision time came, performance metrics won out over vibes. HH was released on Friday, April 25.

https://archive.is/v4dPa

They ended up having to roll HH back.

[+] esafak|3 months ago|reply
It's not 'emergent' in the sense that it just happens; it's a byproduct of human feedback, and it can be neutralized.
[+] oceansky|3 months ago|reply
But it IS intentional, more sycophantry usually means more engagement.
[+] dec0dedab0de|3 months ago|reply
I always thought that "Dark Patterns" could be emergent from AB testing, and prioritizing metrics over user experience. Not necessarily an intentionally hostile design, but one that seems to be working well based on limited criteria.
[+] layer8|3 months ago|reply
“Dark pattern” can apply to situations where the behavior is deceptive for the user, regardless of whether the deception itself is intentional, as long as the overall effect is intentional, or is at least tolerated despite being avoidable. The point, and the justified criticism, is that users are being deceived about the merit of their ideas, convictions, and qualities in a way that appears sytemic, even though the LLM in principle does know better.
[+] alanbernstein|3 months ago|reply
Before reading the article, I interpreted the quotation marks in the headline as addressing this exact issue. The author even describes dark patterns as a product of design.

For an LLM which is fundamentally more of an emergent system, surely there is value in a concept analogous to old fashioned dark patterns, even if they're emergent rather than explicit? What's a better term, Dark Instincts?

[+] jasonjmcghee|3 months ago|reply
I feel like it's a popular opinion (I've seen it many times) that it's intentional with the reasoning that it does much better on human-in-the-loop benchmarks (e.g. lm arena) when it's sycophantic.

(I have no knowledge of whether or not this is true)

[+] andsoitis|3 months ago|reply
> "Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term.

The way I think about it is that sycophancy is due to optimizing engagement, which I think is intentional.

[+] vkou|3 months ago|reply
The intention of a system is no more, and no less than what the system does.
[+] throwaway290|3 months ago|reply
If I am addicted to scrolling tiktok, is it dark pattern to make UI keep me in the app as long as possible or just "emergent property" because apparently it's what I want?
[+] chowells|3 months ago|reply
"Dark pattern" implies bad for users but good for the provider. Mens rea was never a requirement.
[+] the_af|3 months ago|reply
I think at this point it's intentional. They sometimes get it wrong and go too far (breaking suspension of disbelief) but that's the fine-tuning thing. I think they absolutely want people to have a friendly chatbot prone to praising, for engagement.
[+] gradus_ad|3 months ago|reply
Well the big labs certainly haven't intentionally tried to train away this emergent property... Not sure how "hey let's make the model disagree with the user more" would go over with leadership. Customer is always right, right?
[+] insane_dreamer|3 months ago|reply
It’s certainly intentional. It’s certainly possible to train the model not to respond that way.
[+] tsunamifury|3 months ago|reply
Yo it was an engagement pattern openAI found specifically grew subscriptions and conversation length.

It’s a dark pattern for sure.

[+] hereme888|3 months ago|reply
Grok 4.1 thinks my 1-day vibe-coded apps are SOTA-level and rival the most competitive market offerings. Literally tells me they're some of the best codebases it's ever reviewed.

It even added itself as the default LLM provider.

When I tried Gemini 3 Pro, it very much inserted itself as the supported LLM integration.

OpenAI hasn't tried to do that yet.

[+] uncletaco|3 months ago|reply
Grok 4.1 told me my writing surpassed the authors I cited as influence.
[+] mrkaluzny|3 months ago|reply
The real dark pattern is the way LLMs started to prompt you to continue conversation in sometimes weird, but still engaging way.

Paired with Claude's memory it's getting weird. It's obsessing about certain aspects and wants to channel all possible routes into more engaging conversation even if it's a short informational query

[+] the_af|3 months ago|reply
Tangent: the analysis linked to by the article to another article about rhetorical tricks is pretty interesting. I hadn't realized it consciously, but LLMs really go beyond the em-dashes thing, and part of their tell-tale signs is indeed "punched up paragraphs". Every paragraphs has to be played for maximum effect, contain an opposition of ideas/metaphors, and end with a mic drop!

Some of it is normal in humans, but LLMs do it all the goddamn time, if not told otherwise.

I think it might be for engagement (like the sycophancy) but also because they must have been trained in online conversation, where we humans tend to be more melodramatic and less "normal" in our conversation.

[+] behnamoh|3 months ago|reply
Lots of research shows post-training dumbs down the models but no one listens because people are too lazy to learn proper prompt programming and would rather have a model already understand the concept of a conversation.
[+] ACCount37|3 months ago|reply
"Post-training" is too much of a conflation, because there are many post-training methods and each of them has its own quirky failure modes.

That being said? RLHF on user feedback data is model poison.

Users are NOT reliable model evaluators, and user feedback data should be treated with the same level of precaution you would treat radioactive waste.

Professional are not very reliable either, but the users are so much worse.

[+] CuriouslyC|3 months ago|reply
Some distributional collapse is good in terms of making these things reliable tools. The creativity and divergent thinking does take a hit, but humans are better at this anyhow so I view it as a net W.
[+] CGMthrowaway|3 months ago|reply
How do you take a raw model and use it without chatting ? Asking as a layman
[+] nomel|3 months ago|reply
The "alignment tax".
[+] aeternum|3 months ago|reply
1) More of an emergent behavior than a dark pattern. 2) Imma let you finish but hallucinations was first.
[+] nrhrjrjrjtntbt|3 months ago|reply
A pattern is dark if intentional. I would say hallucinations are like CAP theorem, just the way it is. Sycophency is somewhat trained. But not a dark pattern either as it isn't totally intended.
[+] heresie-dabord|3 months ago|reply
The first "dark pattern" was exaggerating the features and value of the technology.
[+] cat_plus_plus|3 months ago|reply
It's just a matter of system prompt. Create a nagging spouse Gemini Gem / Grok project. Give good step by step instructions about shading your joy, latching on to small inaccuracies, scrutinizing your choices and your habits. Emphasize catching signs of intoxication like typos. Give half a dozen examples of stelar nags in different conversations. There is enough reddit training data that model went through to follow well given a good pattern to latch on to.

Then see how many takers you find. There are already nagging spouses / critical managers, people want AI to do something they are not getting elsewhere.

[+] roywiggins|3 months ago|reply
> Quickly learned that people are ridiculously sensitive: “Has narcissistic tendencies” - “No I do not!”, had to hide it. Hence this batch of the extreme sycophancy RLHF.

Sorry, but that doesn't seem "ridiculously sensitive" to me at all. Imagine if you went to Amazon.com and there was a button you could press to get it to pseudo-psychoanalyze you based on your purchases. People would rightly hate that! People probably ought to be sensitive to megacorps using buckets of algorithms to psychoanalyze them.

[+] wat10000|3 months ago|reply
It's worse than that. Imagine if you went to Amazon.com and they were automatically pseudo-psychoanalyzing you based on your purchases, and there was a button to show their conclusions. And their fix was to remove the button.

And actually, the only hypothetical thing about this is the button. Amazon is definitely doing this (as is any other retailer of significant size), they're just smart enough to never reveal it to you directly.

[+] RevEng|3 months ago|reply
I argue that the first dark pattern is the "hallucination" that we all just take for granted.

LLMs are compulsive liars: they will confidently and eloquently argue for things that are clearly false. You could even say they are psychopathic because they do so without concern or remorse. This is a horrible combination that you would normally see in a cult leader or CEO but now we are all confiding in them and asking them for help with everything from medical issues to personal relationships.

Bigger models aren't helping the problem but making it worse. Now models will give you longer arguments with more facts used to push their false conclusion and they will even insist that you are wrong for disagreeing with it.

[+] nickphx|3 months ago|reply
ehhh.. the misleading claims boasted in the typical AI FOMO marketing is/was the first "dark pattern".
[+] Nevermark|3 months ago|reply
[EDIT - Deleted poor humor re how we flatter our pets.]

I am not sure we are going to solve these problems in the time frames in which they will change again, or be moot.

We still haven't brought social media manipulation enabled by vast privacy violating surveillance to heel. It has been 20 years. What will the world look like in 20 more years?

If we can't outlaw scalable, damaging, conflicts of interest (the conflict, not the business), in the age of scaling, how are we going to stop people from finding models that will tell them nice things.

It will be the same privacy violating manipulators who supply sycophantic models. Surveillance + manipulation (ads, politics, ...) + AI + real time. Surveillance informed manipulation is the product/harm/service they are paid for.