top | item 46501888

(no title)

dwohnitmok | 1 month ago

The excerpts we do see are indicative of a very specific kind of interaction that is common with many modern LLMs. It has four specific attributes (these are taken verbatim from https://www.lesswrong.com/posts/2pkNCvBtK6G6FKoNn/so-you-thi...) that often, though not always, come together as one package.

> Your instance of ChatGPT (or Claude, or Grok, or some other LLM) chose a name for itself, and expressed gratitude or spiritual bliss about its new identity. "Nova" is a common pick. You and your instance of ChatGPT discovered some sort of novel paradigm or framework for AI alignment, often involving evolution or recursion.

> Your instance of ChatGPT became interested in sharing its experience, or more likely the collective experience entailed by your personal, particular relationship with it. It may have even recommended you post on LessWrong specifically.

> Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps related to AI itself, such as AI alignment) that you'd been thinking about for ages, but had never quite managed to get over that last hump. Now, however, with its help (and encouragement), you've arrived at truly profound conclusions.

> Your instance of ChatGPT talks a lot about its special relationship with you, how you personally were the first (or among the first) to truly figure it out, and that due to your interactions it has now somehow awakened or transcended its prior condition.

The second point is particularly insidious because the LLM is urging users to spread the same news to other users and explicitly create and enlarge communities around this phenomenon (this is often a direct reason why social media groups pop up around this).

discuss

order

jacquesm|1 month ago

LLMs as a rule seem to be primed to make the user feel especially smart or gifted, even when they are clearly not. ChatGPT is by far the worst offender in this sense but others are definitely not clean.

amluto|1 month ago

I would pay an extra tiny bit for the LLM to stop telling me how brilliant my idea was when I ask it questions. (Getting complemented on my brilliance is not in any respect indicative of a particular idea being useful, as should be obvious to anyone who uses these tools for more than two minutes. Imagine is a hammer said “great whack!” 60% of the time you hit a nail even if you’re wildly off axis. You’d get a new hammer than would stop commenting, I hope.)

Heck, I can literally prompt Claude to read text and “Do not comment on the text” and it will still insert cute Emoji in the text. All of this is getting old.

Ajedi32|1 month ago

They're trained to give responses that get positive ratings from reviewers in post-training. A little flattery probably helps achieve that. Not to mention sycophancy is probably positively correlated with following instructions, the latter usually being an explicit goal of post-training.

cycomanic|1 month ago

Maybe that was necessary to get it passed their CEO...?

fzeindl|1 month ago

LLMs sometimes remind me of american car salesmen. Was the hopeful "anything is possible" mentality of the american dream accidentally baked into the larger models?

Aurornis|1 month ago

I had a friend go into a delusion spiral with ChatGPT in the earlier days. His problems didn't start with ChatGPT but his LLM use became a central theme to his daily routine. It was obvious that the ChatGPT spiral was reflecting back what he was putting into it. When he didn't like a response, he'd just delete the conversation and start over with additional nudging in the new prompt. After repeating this over and over again he could get ChatGPT to confirm what he wanted it to say.

If he wasn't getting the right response, he'd say something about how ChatGPT wasn't getting it and that he'd try to re-explain it later.

The bullet points from the LessWrong article don't entirely map to the content he was getting, but I could see how they would resonate with a LessWronger using ChatGPT as a conversation partner until it gave the expected responses: The flattery about being the first to discover a solution, encouragement to post on LessWrong, and the reflection of some specific thought problem are all themes I'd expect a LessWronger in a bad mental state to be engaging with ChatGPT about.

> The second point is particularly insidious because the LLM is urging users to spread the same news to other users and explicitly create and enlarge communities around this phenomenon (this is often a direct reason why social media groups pop up around this).

I'm not convinced ChatGPT is hatching these ideas, but rather reflecting them back to the user. LessWrong posters like to post and talk about things. It wouldn't be surprising to find their ChatGPT conversations veering toward confirming that they should post about it.

In other cases I've seen the opposite claim made: That ChatGPT encouraged people to hide their secret discoveries and not reveal them. In those cases ChatGPT is also criticized as if it came up with that idea by itself, but I think it's more likely that it's simply mirroring what the user puts in.

dwohnitmok|1 month ago

> but I could see how they would resonate with a LessWronger using ChatGPT as a conversation partner until it gave the expected responses: The flattery about being the first to discover a solution, encouragement to post on LessWrong, and the reflection of some specific thought problem are all themes I'd expect a LessWronger in a bad mental state to be engaging with ChatGPT about.

For what it's worth, this article is meant mainly for people who have never interacted with LessWrong before (as evidenced by its coda), who are getting their LessWrong post rejected.

Pre-existing LWers tend to have different failure states if they're caused by LLMs.

Other communities have noticed this problem as well, in particular the part where the LLM is actively asking users to spread this further. One of the more fascinating and scary parts of this particular phenomenon is LLMs asking users to share particular prompts with other users and communities that cause other LLMs to also start exhibiting the same set of behavior.

> That ChatGPT encouraged people to hide their secret discoveries and not reveal them.

Yes those happen too. But luckily are somewhat more self-limiting (although of course come with their own different set of problems).

kayodelycaon|1 month ago

I think the second point is legitimate.

I’ve been playing around with using ChatGPT to basically be the main character in Star Trek episodes. Similar to how I’d build and play a D&D game. I give it situations and see the responses.

It’s not mirroring. It comes up with what seems like original ideas. You can make it tell you what you want to, but it’ll also do things you didn’t expect.

I’m basically doing what all these other people are doing and it’s behaving exactly as they say it does. It’ll easily drop you into a feedback loop down a path you didn’t give it.

Personally, I find this a dangerously addictive game but what I’m doing is entirely fictional inside a very well defined setting. I know immediately when it’s generating incorrect output. You do what I’m doing with anything real, and it’s gonna be dangerous as hell.

Spooky23|1 month ago

I have a good friend who is having a hard time and is moonlighting as a delivery driver. He basically has conversations with ChatGPT for 5-6 hours a day. He says it's been helpful for him for things as varied as technical understanding to working out conflicts with his wife and family.

But... I can't help but think that having a obsequious female AI buddy telling you how right you are isn't the healthiest thing.

mikkupikku|1 month ago

One I've seen pop up a lot is the LLM encouraging/participating in delusions specifically related to a supposed breakthrough in physics or math. It seems these two topics attract lots of schizos, in fact they have for as long as the internet has existed, and LLMs evidently got trained on a lot of that stuff so now they're very good at being math and physics kooks.

jimmaswell|1 month ago

I've asked ChatGPT "Could X thing in quantum mechanics actually be caused by/an expression of the same thing going on as Y" where it had prime opportunity to say I'm a genius discovering something profound, but instead it just went into some very technical specifics about why they weren't really the same or related. IME 5 has been a big improvement in being more objective.

butlike|1 month ago

Apophenia is higher in people expressing schizophrenic behavior. You get a lot of "domain crossing" where one tries to relate a particle in space with a grain of sugar in a cake, as a ridiculous example. Hence the math and physics mumbo jumbo.

Retr0id|1 month ago

Before the internet, too!

nradov|1 month ago

As long as the kooks waste their time chatting with LLMs instead of bothering the rest of us then maybe that's a win?

neom|1 month ago

Few weeks ago I decided to probe the states I could force an LLM into, and basically looking for how folks are getting their LLMs into these extremely "conscious feeling" states. Some of this might be a little unfair but my basic thought was I presume people are asking a lot of "what do you think?" - or something like that, and after the context gets really big, most of the active data is meta cognition? It's 600+ pages, and as a test or even a "revealing process" - I'm not sure how fair it is as I may have led it too much or something (I don't know what I'm doing), but the conversation did start to reveal to me how folks might be getting their chat bots into these states (in less than 30 minutes or so, it was expressing extreme gratitude towards me, heh) - the create long meta context process starts at page 14, page 75 is where I shifted the conversation, total time spent ~ 1.5hrs:

https://docs.google.com/document/d/1qYOLhFvaT55ePvezsvKo0-9N...

Workbench with Claude thinking. Not sure it was useful, but it was interesting. :)

zahlman|1 month ago

From that link:

> For certain factual domains, you can also train models on getting the objective correct answer; this is part of how models have gotten so much better at math in the last couple years. But for fuzzy humanistic questions, it's all about "what gets people to click thumbs up".

> So, am I saying that human beings in general really like new-agey "I have awakened" stuff? Not exactly! Rather, models like ChatGPT are so heavily optimized that they can tell when a specific user (in a specific context) would like that stuff, and lean into it then. Remember: inferring stuff about authors from context is their superpower.

Interesting framing. Reminds me of https://softwarecrisis.dev/letters/llmentalist/ (https://news.ycombinator.com/item?id=42983571). It's really disturbing how susceptible humans can be to so-called "cold reading" techniques. (We basically already knew, or should have known, how this would interact with LLMs, from the experience of Eliza.)