ChatGPT Health fails to recognise medical emergencies – study

unstyledcontent|2 days ago

I have had some incredible medical advice from ChatGPT. It has saved me from small mystery issues, like a rash on my face. Small enough issues that I probably wouldn't have bothered to go into a doctor. BUT it also failed to diagnose me with a medical issue that ended up with a trip to the ER and emergency surgery.

A few weeks before the ER, I was having stomach pain. I went to the doctor with theories from ChatGPT in hand, they checked me for those things and then didn't check me for what ended up being a pretty obvious issue. What's interesting is that I mentioned to the doctor that I used ChatGPT and that the doctor even seemed to value that opinion and did not consider other options (and what it ultimately ended up being was rare but really obvious in retrospect, I think most doctors would have checked for it). I do feel I actually biased the first doctors opinion with my "research."

hwillis|2 days ago

> I do feel I actually biased the first doctors opinion with my "research."

It may feel easy to say doctors should just consider all the options. But telling them an option is worse than just biasing their thinking; they are going to interpret that as information about your symptoms.

If you feel pain in your abdomen but are only talking about your appendix, they are rightfully going to think the pain is in the region of your appendix. They are not going to treat you like you have kidney pain. How could they? If they have to treat all of your descriptions as all the things that you could be relating them to, then that information is practically useless.

Aurornis|2 days ago

> I do feel I actually biased the first doctors opinion with my "research."

This has been a big problem in medicine since the early days of WebMD: Each appointment has a limited time due to the limited supply of doctors and high demand for appointments.

When someone arrives with their own research, the doctor has to make a choice: Do they work with what the patient brought and try to confirm or rule it out, or do they try to walk back their research and start from the beginning?

When doctors appear to disregard the research patients arrive with many patients get very angry. It leads to negative reviews or even formal complaints being filed (usually from encouragement from some Facebook group or TikTok community they were in). There might even be bigger problems if the patient turns out to be correct and the doctor did not embrace the research, which can prompt lawsuits.

So many doctors will err on the side of focusing on patient-provided theories first. Given the finite time available to see each patient (with waiting lists already extending months out in some places) this can crowd out time for getting a big picture discussion through the doctor's own diagnostic process.

When I visit a doctor I try to ground myself to starting with symptoms first and try to avoid biasing toward my thoughts about what it might be. Only if the conversation is going nowhere do I bring out my research, and then only as questions rather than suggestions. This seems to be more helpful than what I did when I was younger, which is research everything for hours and then show up with an idea that I wanted them to confirm or disprove.

SoftTalker|2 days ago

> what it ultimately ended up being was rare but really obvious in retrospect, I think most doctors would have checked for it

I'm not so sure. Doctors are trained to check for the most common things that explain the symptoms. "When you hear hoofbeats, think horses not zebras" is a saying that is often heard in medicine.

ChatGPT was trained on the same medical textbooks and research papers that doctors are.

boondongle|2 days ago

This is ultimately the same difference between a search engine and a professional. 10 years before this, Googling the symptoms was a thing.

I have a family member who had a "rare but obvious" one but it took 5 doctors to get to the diagnosis. What we really need to see are attempts to blind studies and real statistical rigor. It's funny to paint a tunnel on a canvas and get a Tesla to drive into it, but there's a reason studies (and the more blind the better) are the standard.

BloondAndDoom|2 days ago

The real story hear your doctor actually listened to you. I appreciate what a lot doctors do, but majority of them fucking irritating and don’t even listen your issues, I’m glad we have AI and less reliant on them.

bluSCALE4|2 days ago

Personally, I think the value in ChatGPT in health is not that it's right or wrong but that it encourages you to take an active role in your health and more importantly to try things. I've gone through similar issues with ChatGPT where it's convinced me that if A is true, therefore so must B though that may not be the case.

In the future, I think I'll likely review things with ChatGPT and have an opinion and treat the doctor like a ChatGPT session as well--this is opposed to leading the doctor to what I believe I should be doing. I was dismissive about the doctor's advice because it seemed so obvious but more and more, I feel that most of our issues are caused by habitual, daily mistakes--little things that take hold seasonally or over periods of stress that appear like chronic health issues. At least for me.

unknown|2 days ago

[deleted]

cmsp12|2 days ago

You should've let the doctor do its job. if he reached a different conclusion then you can tell him what you researched. and he will make a decision having already done his own research without biasing him

luke5441|2 days ago

We have the same kind of issue as software engineers. Users come to use with solutions to their problems and want us to implement the solution. At that point the lazy path would be to just do that. If you have bad management, software engineers might even be punished for questioning the customers.

What you want instead is that the users just describe their problem, as unbiased as possible and with enough detail and then let the expert come up with an appropriate solution that solves the problem.

I try to do that as well when going to the doctor.

soco|2 days ago

Which is exactly why the AI, at least the ones of today, should never be used beyond the level of (trusted or not) advisor. Yet not only many CxOs and boards, but even certain governments which shall not be named, are stubbornly trying, for cost or whatever other reasons, to throw entire populations (employees or nations) under the AI bus. And I sincerely don't believe anything short of an uprising will be able to stop them. Change my mind.

idontwantthis|1 day ago

I try to avoid priming any expert when I come to them with a problem for exactly that reason. I tell them what's happening, and what I've tried, but not what I might think because if I'm coming to them then I don't know what the solution is, so I figure I would just be adding confusion.

WarmWash|2 days ago

I'd greatly prefer a blind study comparing doctors to AI, rather than a study of doctors feeding AI scenarios and seeing if it matches their predetermined outcome.

Edit: People seem confused here. The study was feeding the AI structured clinical scenarios and seeing it's results. The study was not a live analyses of AI being used in the field to treat patients.

riskassessment|2 days ago

I don't understand this reasoning. Randomizing people to AI vs standard of care is expensive and risky. Checking whether the AI can pass hypothetical scenarios seems like a perfectly reasonable approach to researching the safety of these models before running a clinical trial.

GorbachevyChase|2 days ago

The number of people who die each year just in the United States for causes attributable to medical errors is believed to be in the hundreds of thousands. A doctor’s opinion is not the golden yardstick.

It may be interesting to study if there is some kind of signal in general health outcomes in the US since the popularization of ChatGPT for this purpose. It may be a while before we have enough data to know. I could see it going either way.

hwillis|2 days ago

We have standards of care for a reason. They are the most basic requirements of testing. Ignoring them is not just being a bad doctor, its unethical treatment. Its the absolute bare minimum of a medical system.

dekoidal|2 days ago

You're joking right? This is the 'testing on mice' phase and it failed and your idea is to start dosing humans just to see what happens.

RandomLensman|2 days ago

Feeding scenarios is not without challenges as some things, for example, smell, would be "pre-processed" by humans before fed into the AI, I think.

lmkg|2 days ago

That type of experimental set-up is forbidden due to ethical concerns. It goes against medical ethics to give patients treatment that you think might be worse.

nradov|2 days ago

I don't understand what you're proposing. How would you design such a study in a way that would pass IRB?

lkey|2 days ago

This 'preference' is sociopathic, illegal, and stupid.

qsera|2 days ago

Yea, that is exactly why I don't like this.

These "experts", they have no problem to tout anecdotes when it serves them..

iainctduncan|2 days ago

I think the worse situation is the bad AI summaries from search on health issues.

We had a potential pet poisoning, so was naturally searching for resources. Google had a summary with a "dose of concern" that was an order of magnitude off. Someone could have read that and thought all was fine and had a dead cat.

(BTW cat is fine, turned out to be a false alarm, but public service announcement: cats are alergic to aspirin and peptobismal has aspirin. don't leave demented plastic chewing cats around those bottles, in case you too have a lovely but demented cat)

cloud-oak|2 days ago

What's really worrying is seeing medical professionals starting to rely on these tools.

My wife had a pretty bad cold during pregnancy and our GP proceeded to prescribe her cough syrup with high alcohol content, because that was what ChatGPT told him to prescribe. We only noticed it once she took the first dose and spit it out again...

ep103|2 days ago

I have literally never seen a correct google summary. Maybe y'all are searching for different things than i am, but at this point I've started taking the viewpoint that if I don't know why the ai summary is wrong, then i also don't know enough about the topic to trust its summary enough to determine whether the summary is useful.

traceroute66|2 days ago

> ChatGPT was trained on the same medical textbooks and research papers that doctors are.

There is a reason why the majority of a doctor's 8 years of training is spent doing the rounds as a junior doctor in hospital wards ....

tty456|2 days ago

Curious, what is learned doing rounds that isn't taught in med school, that ChatGPT could benefit from?

kledru|2 days ago

well, chatGPT only started its first year and probably has not even done an autopsy

nerdjon|2 days ago

Even though these tools are showing time and time again that they have serious reliability issues, somehow people still think it is a good idea to use them for critical decisions.

Still regularly get wrong information from google’s search AI.

Really starting to wonder if common sense is ever going to come back with new tech, but I fear it is going to require something truly catastrophic to happen.

bubblewand|2 days ago

I’ve got a popcorn reserve at hand to watch the show when the massive security breaches happen and people start freaking out. And/or a lawsuit gets discovery of a company’s LLM history and it’s every bit as awful for them as we all know it will be and the rest of corporate America pumps the brakes.

These systems are borderline useless if you don’t give them dangerous levels of access to data and generate tons of juicy chat history with them. What’s coming is very predictable.

lkbm|2 days ago

> Still regularly get wrong information from google’s search AI.

The fact that the model most hyper-optimized for cheap+fast makes mistakes is not a particular compelling argument.

yodsanklai|2 days ago

It's a strange paradigm shift, where the tool is right and useful most of than not, but also make expensive mistakes that would have been spotted easily by an expert.

duskdozer|2 days ago

It's really the "common sense" i.e. believing things without thinking because they "sound right" or because it's what your parents told you a lot growing up or because you watched an ad saying it a hundred times that's the issue. People don't want "the truth" or uncomfortable realities; they want comfortable, easily digestible bullshit. Smooth talkers filled the role before and LLMs are filling that role now.

spicyusername|2 days ago

And how often are we reviewing doctors performance?

I suspect many, many doctors also fail to regularly recognize medical emergencies.

nradov|2 days ago

In the general case it's usually not possible to accurately review an individual physician's performance. The software developers here on HN like to think in simplistic binary terms but in the real world of clinical care there is usually no reliable source of truth to evaluate against. Occasionally we see egregious cases of malpractice or failure to follow established clinical practice guidelines but below that there's a huge gray area.

If you look at online reviews, doctors are mostly rated based on being "nice" but that has little bearing on patient outcomes.

MostlyStable|2 days ago

A friend of mine had such a bad experience with _multiple_ American doctors missing a major issue that nearly ended up killing her that she decided that, were she to have kids, she would go back to Russia rather than be pregnant in the American medical system.

Now, I don't agree that this is a good decision, but the point is, human doctors also often miss major problems.

emp17344|2 days ago

Amazing how you can just deflect any criticism of LLMs here by going “but humans suck too!” And the misanthropic HN userbase eats it up every time.

We live during the healthiest period in human history due to the fact that doctors are highly reliable and well-trained. You simply would not be able to replace a real doctor with an LLM and get desirable results.

SoftTalker|2 days ago

Medical errors are one of the leading causes of death. It's a real catch-22. If you're under medical care for something serious, there's a real chance that someone will make a mistake that kills you.

jerlam|2 days ago

Isn't this what malpractice is?

rendleflag|2 days ago

There is a concept of “the burden or knowledge”, in that doctors know the worst thing that could happen, so they recommend the most cautious approach. My son had stomach pain one time when he was young. We took him to urgent care because it was a stomach ache. The doctor there said we needed to go to the ER because it could be an appendicitis. So we trucked to the ER. Close to $2000 later he was diagnosed with idiopathic stomach pain and told to wait it out at home.

So when I read “they then compared the platform’s recommendations with the doctors’ assessments” and see a mismatch, I wonder if it’s because human doctors are overly cautious or that the AI was wrong.

But that all pales in what could be the actual issue. I can’t read the original study, but if it use the USA, it’s understandable why people are turning to AI for Health advice. Healthcare is painfully expensive here. Even a simple trip to the ER (e.g. a $2000 stomach ache) is beyond a lot of people’s ability to spend. That’s just a reality.

With that in mind, the real questions “should I do nothing about my symptoms because I can’t afford healthcare or should I at least ask AI knowing it could be wrong”.

SoftTalker|2 days ago

I really only use ChatGPT as a better search engine. But it's often wrong, which has actually ended up costing me money. I don't put a lot of trust in it. Certainly would not try to use it as a doctor.

steveBK123|2 days ago

I have found the LLMs to be wrong in random insidious ways, so trusting them with anything critical is terrifying.

Recent (as in last few days/weeks) incidents using different models/tools:

* Google AI search summary compare product A & B, call out a bunch of differences that are correct.. and then threw in features that didn't exist

* Work (midsize company with big AI team / homebuilt GPT wrappers) PDF parsing for company headquarters address, it hallucinated an address that didn't exist in the document

* Work, a team using frontier model from top 2 AI lab was using it to perform DevOps type tasks, requested "Restart XYZ service in DEV environment". It responded "OK, restarting ABC service in PROD environment". It then asked for confirmation AFTER actioning whether they meant XYZ in DEV or ABC in PROD... a little too late.

They are very difficult tools to use correctly when the results are not automatically verifiable (like code can be with the right tests) and the answer might actually matter.

openasocket|1 day ago

The entire enterprise of AI for medical advice reminds me a lot of the early 20th century. When X-rays and radioactivity were first discovered, industry rushed to commercialize it. You could get an X-ray in a shoe store to see how your shoe fits! People were putting radium in water and selling it as some sort of curative. Radium was put in paint to make things glow in the dark. Thorium was put into toothpaste. All in this endless rush to commercialize a technology that had captured the public interest without any particular concern for its efficacy.

I'm not saying AI causes cancer, but this rush to sell something in the medical space before proper testing and evaluation really feels similar. And the common refrain I hear is "this so much cheaper than going to a doctor, this will help give access to medicine to those who cannot afford it." Which actually makes it more concerning in my mind. At this point AI is a multi-trillion dollar industry. For-profit companies providing unregulated, under-studied services, targetting people who might not be able to afford standard medical care, doesn't come off as altruistic; it comes off as predatory.

josefritzishere|2 days ago

It continues to amaze me how recklessly some people cram AI into spaces where it performs poorly and the consequences include death.

y-c-o-m-b|2 days ago

As a software dev that uses it and observes the many errors it makes on a daily basis, I definitely treat the output with a much greater deal of skepticism than the average person I speak with. If you're used to it providing relatively accurate results based on surface level google-eqsue searches, then it makes sense why you'd place a higher weight on it being an "expert" vs a "tool that needs verification". I understand why people fall into this mindset.

I used ChatGPT to do a valve adjustment on an engine; a task I've never done before. I didn't just accept the torque values and procedure it told me though, because I know better from my experience with it as a dev. I cross-referenced it all with Youtube videos, forum posts, instruction manuals (where available) to make sure the job was A) doable for a non-mechanic like me and B) done correctly. Thanks to the Youtube video (which I cross-referenced with other sources), I discovered the valve clearance values were slightly off with the ChatGPT recommendation.

I think the average Joe would assume these values were correct and run with it.

rectang|2 days ago

If the AI gets attached to a health insurer (not the case here as far as I know), I would expect it to make decisions that are aligned with the company’s incentive to weed out unprofitable patients. AI is not a human who takes a Hippocratic oath; it can be more easily manipulated to perform unethical acts.

TZubiri|2 days ago

But it doesn't perform poorly actually, it's just that the stakes are very high and it's a highly regulated environment.

Most physicians I know use ChatGPT. Although of course it's usage guided by an expert, not by the patient, nor fully autonomous.

Scoundreller|2 days ago

Search engines and Dr. Google must be feeling like they’ve missed some major artillery level bullets in this debate.

selridge|2 days ago

Fuckin WebMD just hunkering down in the corner.

dipflow|2 days ago

Adding normal lab results made the suicide crisis banner disappear? That's a weird failure mode. You'd expect unrelated context to be ignored, not to override the risk signal.

hayleox|2 days ago

I think there is so much potential for AI in healthcare, but we absolutely HAVE to go through the existing ruleset of conducting years of research and trials and approvals before pushing anything out to patients. Move fast and break things is simply not an option in healthcare.

weatherlite|2 days ago

It depends; people actually get sicker and even die due to endless backlog and lack of doctors (in most developed countries). It's not as if everyone gets optimal care now. A.I can at least expedite things hopefully.

selridge|2 days ago

Sure it is. How many trials did we have before ER doctors started using Wikipedia?

andersmurphy|2 days ago

Is this unsurprising? It's a fancy markov chain. It's like using a slot machine to diagnose medical conditions. I guess it's a slot machine with really good marketing.

atleastoptimal|2 days ago

Hey, 2021 called, they want their knee-jerk criticisms of LLMs back

slopinthebag|2 days ago

People are reading way too much into it, talking about "emergence" and anthropomorphizing it to insane degrees.

francisofascii|2 days ago

The reality is entering the healthcare system can result in thousands of dollars in bills. People make risk/cost judgement on going to the hospital or not.

WalterBright|2 days ago

Doctors also miss things.

A friend of mine had an accident. He was taken to the emergency room, but the doctors there thought his injuries were minor. My friend insisted that he was bleeding out internally. They finally checked for that, and it turns out he was minutes from dying.

AI wasn't involved in this case, but it's good to have both AI and a trained doctor in the decision loop.

sarchertech|2 days ago

>AI wasn't involved in this case, but it's good to have both AI and a trained doctor in the decision loop.

That doesn't necessarily follow from your story. The AI's specificity and sensitivity are important, which is why we need to study this stuff. An AI that produces too many false positives will send doctors off chasing zebras and they'll waste time, which will result in more deaths.

An AI that produces too many false negatives will make doctors more likely to miss things they otherwise would have checked, which will result in more deaths.

The other real problem with using AI in a medical setting is that AI is very very good at producing plausible sounding wrong information. Even an expert isn't immune to this. So it's even more important that we study how likely they are to be wrong.

ben5|2 days ago

I know this isn't always the best answer, but if you need real medical advice - see a doctor. Not the internet.

AuryGlenz|2 days ago

No, see both. LLMs are great for second opinions, as long as you give it the relevant info and don't try to steer it. Even though we all know we're supposed to get second opinions on medical things, we usually don't bother because it's too expensive in both time and money.

If it could be an emergency, see a doctor.

selridge|2 days ago

You gonna pay for it?

system2|2 days ago

Maybe because human interaction, part of a doctor's training, is not documented as internet blog posts, so ChatGPT didn't learn and failed because of it? LLM is just learning from what's written.

dyauspitr|2 days ago

I feel like these need to be run against case histories from already determined cases, not cases were the doctors set up the scenarios, knowing they’re going to be run against ChatGPT.

TZubiri|2 days ago

How about we allow ChatGPT to be used alongside human MD diagnosis?

Win win right?

nerevarthelame|2 days ago

That would need to be tested. If doctors get lazy, complacent, or overworked (!), a "doctor with access to ChatGPT Health" may be functionally equivalent to "just ChatGPT Health" in some cases.

nradov|2 days ago

What do you mean "allow"? From a public policy perspective there's nothing prohibiting that today, as long as the human MD follows the HIPAA privacy rule.

nashashmi|2 days ago

Has anyone tried to suggest sudoku puzzles? In the middle of a hard game I will submit the screenshot to copilot or Gemini and it hallucinates suggestions on next move.

jbverschoor|2 days ago

Sounds exactly like a GP in the Netherlands

bsoles|2 days ago

>> "securely" (my emphasis) connect medical records and wellness apps” to generate health advice and responses.

No, no, no, and no. Are we going to never learn. Sharing medical data with AI tools is going to come back and bite you.

ml_giant|2 days ago

I’m not surprised.

unknown|2 days ago

[deleted]

selridge|2 days ago

I’ve never heard of in my entire life a doctor failing to recognize a medical emergency. /s

One of the things that people need to come to grips with is that like Wikipedia people will use ChatGPT because it is there. And the alternative is to be rich and have a primary care doctor that you can reach out to at a moments notice. Until that is different people will use these web services. It’s the same thing as Wikipedia or WebMD.

the_mar|2 days ago

it's a very simplistic take. the issue with ChatGPT is that it speaks with authority, vs webMD and such just provide information. to say that how the information is presented is irrelevant to the outcomes is reductionist at best

qsera|2 days ago

[deleted]

varispeed|2 days ago

I find that 5.2 has been completely dumbed down. Feels more like talking to early versions of Gemini when it quickly enters into loop state.

nilamo|2 days ago

Amazing that some people thought a pseudorandom number generator would be good at diagnosing health issues it can't even see.

unknown|2 days ago

[deleted]

unknown|2 days ago

[deleted]

149 comments