(no title)
dwohnitmok | 1 month ago
> Your instance of ChatGPT (or Claude, or Grok, or some other LLM) chose a name for itself, and expressed gratitude or spiritual bliss about its new identity. "Nova" is a common pick. You and your instance of ChatGPT discovered some sort of novel paradigm or framework for AI alignment, often involving evolution or recursion.
> Your instance of ChatGPT became interested in sharing its experience, or more likely the collective experience entailed by your personal, particular relationship with it. It may have even recommended you post on LessWrong specifically.
> Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps related to AI itself, such as AI alignment) that you'd been thinking about for ages, but had never quite managed to get over that last hump. Now, however, with its help (and encouragement), you've arrived at truly profound conclusions.
> Your instance of ChatGPT talks a lot about its special relationship with you, how you personally were the first (or among the first) to truly figure it out, and that due to your interactions it has now somehow awakened or transcended its prior condition.
The second point is particularly insidious because the LLM is urging users to spread the same news to other users and explicitly create and enlarge communities around this phenomenon (this is often a direct reason why social media groups pop up around this).
jacquesm|1 month ago
amluto|1 month ago
Heck, I can literally prompt Claude to read text and “Do not comment on the text” and it will still insert cute Emoji in the text. All of this is getting old.
Ajedi32|1 month ago
Hikikomori|1 month ago
cycomanic|1 month ago
fzeindl|1 month ago
Aurornis|1 month ago
If he wasn't getting the right response, he'd say something about how ChatGPT wasn't getting it and that he'd try to re-explain it later.
The bullet points from the LessWrong article don't entirely map to the content he was getting, but I could see how they would resonate with a LessWronger using ChatGPT as a conversation partner until it gave the expected responses: The flattery about being the first to discover a solution, encouragement to post on LessWrong, and the reflection of some specific thought problem are all themes I'd expect a LessWronger in a bad mental state to be engaging with ChatGPT about.
> The second point is particularly insidious because the LLM is urging users to spread the same news to other users and explicitly create and enlarge communities around this phenomenon (this is often a direct reason why social media groups pop up around this).
I'm not convinced ChatGPT is hatching these ideas, but rather reflecting them back to the user. LessWrong posters like to post and talk about things. It wouldn't be surprising to find their ChatGPT conversations veering toward confirming that they should post about it.
In other cases I've seen the opposite claim made: That ChatGPT encouraged people to hide their secret discoveries and not reveal them. In those cases ChatGPT is also criticized as if it came up with that idea by itself, but I think it's more likely that it's simply mirroring what the user puts in.
dwohnitmok|1 month ago
For what it's worth, this article is meant mainly for people who have never interacted with LessWrong before (as evidenced by its coda), who are getting their LessWrong post rejected.
Pre-existing LWers tend to have different failure states if they're caused by LLMs.
Other communities have noticed this problem as well, in particular the part where the LLM is actively asking users to spread this further. One of the more fascinating and scary parts of this particular phenomenon is LLMs asking users to share particular prompts with other users and communities that cause other LLMs to also start exhibiting the same set of behavior.
> That ChatGPT encouraged people to hide their secret discoveries and not reveal them.
Yes those happen too. But luckily are somewhat more self-limiting (although of course come with their own different set of problems).
kayodelycaon|1 month ago
I’ve been playing around with using ChatGPT to basically be the main character in Star Trek episodes. Similar to how I’d build and play a D&D game. I give it situations and see the responses.
It’s not mirroring. It comes up with what seems like original ideas. You can make it tell you what you want to, but it’ll also do things you didn’t expect.
I’m basically doing what all these other people are doing and it’s behaving exactly as they say it does. It’ll easily drop you into a feedback loop down a path you didn’t give it.
Personally, I find this a dangerously addictive game but what I’m doing is entirely fictional inside a very well defined setting. I know immediately when it’s generating incorrect output. You do what I’m doing with anything real, and it’s gonna be dangerous as hell.
Spooky23|1 month ago
But... I can't help but think that having a obsequious female AI buddy telling you how right you are isn't the healthiest thing.
mikkupikku|1 month ago
jimmaswell|1 month ago
butlike|1 month ago
Retr0id|1 month ago
veeti|1 month ago
nradov|1 month ago
neom|1 month ago
https://docs.google.com/document/d/1qYOLhFvaT55ePvezsvKo0-9N...
Workbench with Claude thinking. Not sure it was useful, but it was interesting. :)
zahlman|1 month ago
> For certain factual domains, you can also train models on getting the objective correct answer; this is part of how models have gotten so much better at math in the last couple years. But for fuzzy humanistic questions, it's all about "what gets people to click thumbs up".
> So, am I saying that human beings in general really like new-agey "I have awakened" stuff? Not exactly! Rather, models like ChatGPT are so heavily optimized that they can tell when a specific user (in a specific context) would like that stuff, and lean into it then. Remember: inferring stuff about authors from context is their superpower.
Interesting framing. Reminds me of https://softwarecrisis.dev/letters/llmentalist/ (https://news.ycombinator.com/item?id=42983571). It's really disturbing how susceptible humans can be to so-called "cold reading" techniques. (We basically already knew, or should have known, how this would interact with LLMs, from the experience of Eliza.)