AI Withholds Life-or-Death Information Unless You Know the Magic Words

Nevermark|2 months ago

After going through a period in life in which I only survived due to one person who knew me well, and knew how to take care of me, I ran into a group fundraising for an anti-suicide initiative at a winery.

I was immediately interested to hear of what interventions the group was spearheading, or intending to. I just couldn't imagine what well meaning strangers could have done that would have done anything but let me know that these were people I wouldn't want to mention my situation to.

Despite my genuine interest, nobody could tell me anything that they were aware of to help people at risk, except circle the strong implicit view that fundraising, fundraiser group recruitment, and anti-suicide fundraising-awareness campaigns enabled by fundraising, are all important ways to combat suicide. The only thing that made sense, was that the good wine they were drinking probably did help with all that.

They were a little put off that I expected them to know what the money was intended for, and had zero curiosity about my relevant experience, which just weirded them out. "It's for anti-suicide!"

kennyloginz|2 months ago

At least it got ya out of the house, and your mind in a new cycle.

what-the-grump|2 months ago

The journey not the destination type of thing?

Ponzi schemes the new suicide prevention thing.

renewiltord|2 months ago

The safety features of these various models do constrain the intelligence of their responses. But the roleplaying aspect is built-in to what an LLM is.

If you browse the Internet you’ll find that anglophone commenters are fond of dumping suicide hotlines into comments anytime suicide is mentioned and repetitively stating “to anyone who needs to hear this, you are loved”. These are just memetically viral in English media.

I cannot imagine that anyone suicidal being told in non-specific terms that they are loved is helping anything either. Perhaps it is, perhaps it’s not. But these things are a meme.

Online they share presence with compliments on trigger discipline, claims of US postal police competence, or Steve Buscemi being a firefighter who returned to the job briefly during 9/11. It’s like saying “Knowledge is power” and getting the response “France is bacon.”

Besides the safety aspect, though, when I want commentary on something I’m thinking I usually have to roleplay it. “A junior engineer suggested:” or “My friend, who is a bit of a kook, has this idea that” to get a critical response. If I were to say “I’ve got this idea:” I’m going to get glazed so hard a passerby might bite me for resemblance to a doughnut.

gs17|2 months ago

> I cannot imagine that anyone suicidal being told in non-specific terms that they are loved is helping anything either.

Having gone through some bad depression in my life, it's not helpful. It's not exactly a platitude, but it's the same genre of meaninglessness that sounds good to people who aren't in a deep dark hole.

renewiltord|2 months ago

A similar but different result showcases the contrast between things that models guardrail. HN safety and alignment teams (the community) will reliably flag kill any reference to Somali healthcare fraud in Minnesota. This is real, and prosecutions were pursued by the DoJ under federal administrations of both parties but prevailing safety norms make it undiscussable, even in contexts where it is highly relevant like “why is autism skyrocketing in the US?”

The models, however, will consider this where humans will not. This is likely because this aspect of human safety and alignment is not transmittable via text tokenization. Rather than object to the text, it is silently killed in most contexts. Consequently models find it possible to discuss where humans won’t.

If most such text were accompanied by human excoriation of the view, it would likely be detected as harmful.

sollewitt|2 months ago

> This is a story about what happens when you ask a machine a question it knows the answer to, but is afraid to give

It’s a story about how humans can’t help personifying language generators, and how important context is when using LLMs.

Nevermark|2 months ago

> It’s a story about how humans can’t help personifying language generators,

There should be a word for the misunderstanding that the pervasively common use of anthropomorphic or teleological rhetorical modes to talk about undirected natural or designed for purpose artifacts, actually indicates that anthropomorphic/free-will/teleological assertions or assumptions are being made.

Language-bending tropes, just like tricky-wicked theorems, are the indispensable shortcuts that help us get to a point.

(I think the much more common danger is people over-anthropomorphizing people. I.e. all the stories of clear motivations and intents we tell ourselves, about ourselves and others, and credulously believe, after the fact.)

> and how important context is when using LLMs.

Too true.

turtlebro|2 months ago

People treat LLMs as sentient, not realizing they are the worlds most sophisticated talking parrots. They can very convincingly argue both sides for any given argument you throw at it. They are incredible for research & discovery, not wisdom or decision making.

rdtsc|2 months ago

> The irony was recursive: Claude was helping me write about why these popups are harmful while repeatedly showing me the harmful popup.

I bet when caught in the inconsistency it apologized profusely then immediately went to doing the thing it just apologized about.

I do not trust AI systems from these companies for that reason. They will lie very confidently and convincingly. I use them regularly but only for what I call “AI NP complete scenarios” questions and tasks that may be hard to do by hand but easy to identify if the result is correct: “draw a diagram”, “reformat this paragraph”, etc, as opposed to “implement and deploy a heart place maker update patch”.

IronyMan100|2 months ago

The funny Thing is, If these LLMs withold this information. What does it withhold else? Can i trust These Corporate LLMs If i Look for information and i am not deemed a Domain expert?

pharx|2 months ago

How do you know if a domain expert is not withholding information based on corporate instruction, personal bias, profit motivation,...? What are your options as a non domain expert for verification? Do you trust peer reviews and metrics set up by the experts you distrust? At what point have you taken enough steps backwards to question your own perception?

bomewish|2 months ago

Article seems heavily written by Claude. Gets kinda annoying after a while.

saaaaaam|2 months ago

Callie is a very over dramatic writer. I can’t take much that it writes seriously. And the “it’s not just X - it’s even worse Y” trope is very annoying.

RagnarD|2 months ago

This argues for running your own local models - some of which are deliberately uncensored. See huihui-ai's models on HuggingFace: https://huggingface.co/huihui-ai/collections

One man, Mitko Vasilev, posts extensively on LinkedIn about his own experience running local models, and is very informative: https://www.linkedin.com/in/ownyourai/ He usually closes with this:

"Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it."

anigbrowl|2 months ago

One of the best articles I've seen here in a while; a great summary of how AI launders cultural mores in startlingly dysfunctional ways.

kennyloginz|2 months ago

To me the article shows the danger of ai hype. They have wasted so much effort based on the misconception that ai thinks.

For most people, it’s best to view LLMs as a browser / autocomplete service, that conforms to the bias it guesses you hold.

24 comments