AI Detects Heart Failure from One Heartbeat: Study

[+] corodra|6 years ago|reply

Wasn't a similar claim made about an AI detecting skin cancer from moles? Once the AI was deployed in the real world is failed miserably. I think it was a ton of false-positives because it was trained on images where cancerous moles all had images of rulers with them and the benign ones didn't have rulers in the image. So it just picked up on the ruler as a cancer indicator.

[+] Sanzig|6 years ago|reply

Would you happen to have a source for that story? My workplace has really swallowed the AI Kool-Aid lately, so I would like to have some cautionary counterexamples to demonstrate potential pitfalls of the technology.

It's got a lot of interesting applications for our field which I am excited about, but there seems to be a tendency among non-experts to consider it a magic bullet that can solve any sort of problem. In particular, I am concerned about applications where conventional approaches have already converged on an optimal solution that's used operationally, but somebody wants to throw AI at it because they thought it might be cool without first understanding the implications.

[+] et2o|6 years ago|reply

Yeah. Long term it still looks like that idea might work out, but that was a funny story.

This paper offers like 1/1000th the evidence of that one.

[+] gwern|6 years ago|reply

> Once the AI was deployed in the real world is failed miserably.

They didn't show that.

[+] unknown|6 years ago|reply

[deleted]

[+] Cass|6 years ago|reply

As a doctor as opposed to an AI researcher, so many of the choices this study makes are baffling to me.

First of all, why just one heartbeat? You never capture just one heartbeat on an ECG anyway, and "Is the next heartbeat identical to the first one?" is such an important source of information, it seems completely irrational to exclude it. At least pick TWO heartbeats. If you're gonna pick one random heartbeat, how do you know you didn't pick an extra systole on accident? (Extra systoles look different, and often less healthy, than "normal" heart beats, as they originate from different regions of the heart.)

Secondly, why heart failure and not a heart attack? One definition of heart failure is "the heart is unable to pump sufficiently to maintain blood flow to meet the body's needs," which can be caused by all sorts of factors, many of them external to the actual function of the heart - do we even know for sure that there are ANY ECG changes definitely tied to heart failure? Why not instead try to detect heart attacks, which cause well-defined and well-researched known ECG changes?

(I realize AIs that claim to be able to detect heart attacks already exist. None of the ones I've personally worked with have ever been usable. The false positive rate is ridiculously high. I suppose maybe some research hospital somewhere has a working one?)

[+] Cass|6 years ago|reply

To add to this, looking at figure 4, why is their "average" heartbeat so messed up? That's not what a normal average heartbeat looks like. P is too flat, Q is too big, R is blunted, and there's an extra wave between S and T that's not supposed to be there at all. If their "healthy patient" ECGs were bad enough to produce this mess on average, it's no surprise their AI had no trouble telling the data sets apart.

(For comparison, the "CHF beat" looks a lot more like a healthy heartbeat.)

[+] soVeryTired|6 years ago|reply

> First of all, why just one heartbeat?

I think it's a sort of academic machismo. "Look what we can do - isn't it amazing?"

I saw the same thing in Robotics recently. An academic came to give a talk on localisation using computer vision: they cross-referenced shop signs that were seen by a robotic camera with the shop's location on a map to get a rough estimate of where the robot was. My first question was "what is the incremental benefit of this approach when it was combined with GPS?". It turned out that the researchers just hadn't used GPS at all - almost like they considered it to be "cheating".

I feel like many academic disciplines have unwritten 'rules' that you need to follow if you want to be included in the conversation. Not all of those rules are sensible.

[+] et2o|6 years ago|reply

I'm going to sound like a skeptical jerk here, but 490,000 heartbeats is how many patients? From what I recall these public ECG datasets are like 20 patients who underwent longitudinal ECGs. 500k heart beats is like 5 person-days of ECG recordings.

Ninja Edit: N=~30 patients. For something like ECGs which are readily available, they really should have tried to get more patients. A single clinic anywhere does than 30 EKGs per day. Suggesting this is clinically applicable is ridiculous. It's way too easy to overfit. Chopping up a time series from one patient into 1000 pieces doesn't give you 1000x the patients.

I even think this approach probably will work. Very reasonable given recent work from Geisinger and Mayo. But why are ML people doing press releases about such underwhelming studies?

[+] carbocation|6 years ago|reply

Yes, Table 5 shows that N is 18 without CHF and 15 with CHF. These come from separate data sets that have EKG data sampled at different frequencies.

Basically, they took 18 electrocardiographic tracings (sampled at 128 Hz) from participants without CHF, of whom 13 come from women. They compared them to 15 electrocardiographic tracings (sampled at 250 Hz) from participants with CHF, of whom 4 come from women.

Hard to even know where to begin with this one.

[+] joker3|6 years ago|reply

A lot of machine learning people don't really understand study design or power or things like that. It's gotten a little better over the past decade or so, but this is an area where the field has a lot of room to improve.

[+] Ballas|6 years ago|reply

And also different sources for positive and negative samples.

[+] RosanaAnaDana|6 years ago|reply

I mean, the issue would be in the structure of the cross validation approach. Say, set training = 29, test = 1, build models etc: how well did you do on the one? Rinse, wash hands, repeat 30x. This is your cross validation error rate.

Its not that difficult.

[+] manmal|6 years ago|reply

Didn't they mention that they sampled only 5 minutes of heartbeats per patient? That would be n=1633, assuming a heart rate of 60.

[+] paulcole|6 years ago|reply

> I'm going to sound like a skeptical jerk here

Well you're on the right site for it!

At least you didn't claim that you could come up with something better by tinkering on a rainy Sunday afternoon.

[+] brilee|6 years ago|reply

So first clarification is that heart failure != heart attack. Heart failure is a chronic condition where the heart is unable to pump hard enough to keep blood flowing through the body. Typically results in blood pooling in the leg, shortness of breath, etc.

The study avoids the obvious pitfall, which is to put different slices of one patient's data into both training and test. The press also reports the training accuracy (100%) when the test accuracy/sensitivity/precision metrics are all at around 98%.

Another encouraging sign is that when you dig into the 2% error rate, a majority of those errors turned out to be mislabeled data.

The study also acknowledges the following:

"Our study must also be seen in light of its limitations... First the CHF subjects used in this study suffer from severe CHF only...could yield less accurate results for milder CHF."

I think this is a good proof of concept but that the severe CHF and tiny sample size (33 patients) means that we're a long ways away from clinical usage.

[+] carbocation|6 years ago|reply

The study looks at 33 patients total, and the cases and controls come from entirely different data sets, with data coming from different devices that recorded signal at different frequencies.

There is nothing to see here.

[+] RosanaAnaDana|6 years ago|reply

Well, the issues are fundamental to the calculation of their error statistics. These models and their error rates are, well, crap. If any of the people in my group came back to me with this as an error assessment, they'd be re-doing their work.

[+] qiqitori|6 years ago|reply

Pff, I can detect heart failure with no heartbeats at all.

[+] manmal|6 years ago|reply

Care to explain?

[+] binalpatel|6 years ago|reply

Discussion of this on /r/machinelearning:

https://www.reddit.com/r/MachineLearning/comments/dj5psh/n_n...

[+] phonebucket|6 years ago|reply

The Reddit thread is worth a read. There's a healthy dose of scepticism about the paper there.

[+] fencepost|6 years ago|reply

This is interesting, but more because it indicates that there's adequate data in a single heartbeat to do such diagnosis. In practical terms it's probably not nearly so relevant because it sounds like they were working with the raw data not tracing. By the time you have a patient hooked up to the proper equipment to do this diagnosis you're going to be getting adequate data anyway.

The main impact might be that if this holds up people could be tested with a short hook up in an office instead of with a 24-hour monitoring where they have to bring back a Holter device the next day. Of course, that 24 hour dataset may have independent value of its own for further diagnostics beyond just whether the patient has CHF.

[+] VHRanger|6 years ago|reply

The study is not worth paying attention to.

The datasets for positive cases and negative cases come from different databases. n=30 patients, on top of it.

All this does is recognize the patient/ECG technician who recorded the data. It's basically certain it doesnt generalize

[+] ryanschneider|6 years ago|reply

IMO, the important part is Section 3.3 of the [paper](https://www.sciencedirect.com/science/article/pii/S174680941...), particularly the image at [https://ars.els-cdn.com/content/image/1-s2.0-S17468094193017.... To my eye the difference in shape of the orange and green signals could also be found through more traditional signal processing/statistical means that machine learning.

In a past job I did a combination of manual and machine-learning-based analysis of cardiac signals. We didn't have ECG, but did have PPG (blood flow) and PCG (sound) signals, and a pretty large study group. I recall there being one study participant who's signals were very clearly indicative of heart failure, enough that we raised the issue with our medical advisor about whether the subject should be deanonymized and contacted. In the paper they state that "the CHF subjects used in this study suffer from severe CHF only"; my suspicion is that a simpler, "hand rolled" model based on the features of the ECG could compete very well with this CNN approach for finding the same level of pathology in the ECG signal, without the "black box" of a CNN casting doubt on the technique.

[+] nradov|6 years ago|reply

Congestive heart failure can also be detected fairly reliably based on a sudden increase in weight. It causes fluid retention. There are several programs underway to give Internet connected scales to high risk patients and those report weight every day.

[+] pkaye|6 years ago|reply

It could also be kidney failure. My weight ballooned by 30 lbs when I was progressing towards it. Even the doctor was saying the usual "you need to exercise and eat healthy" until he got my blood test results.

[+] nycbenf|6 years ago|reply

Heart failure patient here. This is kinda cool but tempered a bit by the fact that I've seen multiple cardiologists make a diagnosis by just glancing at a 12 lead ECG sheet. There are some pretty recognizable hallmarks.

[+] mikece|6 years ago|reply

Perhaps the title/premise might best be summarized as based on everything we know we can detect heart failure form monitoring one's body for one heartbeat.

Along with doing a lot of good and making a lot of early catches, I suspect that relying on AI to do medical analysis is going to bring into sharp relief just how much medical science DOESN'T know about the human body and its mysteries. I think we're a long, long way away from handling medical science over to AI and the real fun of AI-guided exploration is about to begin.

[+] unknown|6 years ago|reply

[deleted]

[+] kbody|6 years ago|reply

Clickbait title aside. I find that ethical issues around AI raised by Musk etc. shouldn't be around AI taking over the planet, but rather have ethics around overfitted models or otherwise unrealistic models being pushed for PR or whatever and responsibly playing with people's health and hopes.

[+] querious|6 years ago|reply

One of our simplest “screening” questions for DS roles at my company is: “your model is 100% accurate. How do you feel?” If the answer is anything other than deep skepticism (Data leakage, trivial dataset etc), it’s a big red flag

[+] wil421|6 years ago|reply

How long until society becomes Gattaca? Sorry citizen our “AI” has detected genetic anomalies you will be a disposable factory worker. The rich would surely pay for their children to be genetically altered.

[+] fermenflo|6 years ago|reply

What exactly guarantees that reality? Why can't these tools be used for good moving forward? e.g. detecting heart problems. Seems a little arbitrary to spin technological advancements as progress towards some inevitable dystopian AI-driven future.

[+] logicbombr|6 years ago|reply

last year we've started to cloud recording obstetric ultrasound videos. we add more than 17,000 ultrasound exams to our platform each month. It's probably the largest dataset in the world of obstetric ultrasounds videos (~ 300,000 exams). reading news like this makes me think about how we can explore our dataset using ML/AI and help produce better diagnosis. I have no idea how (We're not an AI company).

If someone here wants to start a project with AI on top of ultrasounds, I'm all in.

let me know at hn at angra.ltd and I can give more details

[+] smt88|6 years ago|reply

I'm not sure that data set will mean anything without human-drawn conclusions about the patient (diagnoses, abormalities, etc.)

[+] RosanaAnaDana|6 years ago|reply

You absolutely can. Hitting you up now.

[+] unknown|6 years ago|reply

[deleted]

[+] unknown|6 years ago|reply

[deleted]

[+] godelzilla|6 years ago|reply

I guess adding "Congestive" to the title would've ruined the click bait.

Also how can the detection of a progressive disease be 100% accurate? I guess details ruin the click bait too.

[+] ryanmcbride|6 years ago|reply

Don't see any mention of how many false-positives they had in the article so... Yeah we'll see how effective this actually is.

67 comments