The interpretation part hit home: "The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging."
"Predict self-reported race". Not race from DNA. (That's routinely available from 23andMe, and is considered an objective measurement.[1]) They should have collected both. Now they don't know what they've measured.
Not too surprising that physical differences across ethnicities are literally more than skin deep. It wouldn’t be shocking that a model could identify one’s ethnicity based on, for example, a microscope image of their hair; why should bone be any different?
I’m more surprised that the distinguishing features haven’t been obvious to trained radiographers for decades. It would be cool to see a followup to this paper that identifies salient distinguishing features. Perhaps a GAN-like model could work—given the trained classifier network, train 1) a second network to generate images that when fed to the classifier, maximize the classification for a given ethnicity, and 2) a third network to discriminate real from fake X-Ray images (to avoid generating noise that happens to minimize the classifier’s loss function). I wonder if the generator would yield images with exaggerated features specific to a given ethnicity, or whether it would yield realistic but uninterpretable images.
I think it's more likely the case that (a) most radiographers aren't trained in medical school to look for distinguishing racial features (why would they be?) and (b) in most cases the radiologist knows or can easily guess the race of the patient anyway so there's no need to try to guess it from X-ray imaging data. There are a lot of anatomical features related to race that have been known since before radiology has been a field, it's just not pertinent to the job of most radiologists.
..”In this modelling study, we defined race as a social, political, and legal construct that relates to the interaction between external perceptions (ie, “how do others see me?”) and self-identification, and specifically make use of self-reported race of patients in all of our experiments.”
Perfect example of citations-driven research. The authors aren’t motivated by a genuinely interesting scientific question (“are anatomical differences between genetically distinct groups of people visible in X-rays?”). Instead, the authors know that training a classifier to predict race will generate controversial headlines and tweets. All publicity, positive or negative, leads to more citations.
This is the only reasonable possible way to do it. Races are fluid and ill-defined constructs, so self-identification is the best you can do for ground truth.
The submitted title ("AI identifies race from xray, researchers don't know how") broke the site guidelines by editorializing. Submitters: please don't do that - it eventually causes your account to lose submission privileges.
The fact that the model seems to be able to make highly accurate predictions even on the images in Figure 2 (including HPF 50 and LPF 10) makes me skeptical. It feels much more probable that this is a sign of data leakage than that the underlying true signal is so strong that it persists even under these transformations.
Compare the performance under high pass and low pass filters in this paper on CIFAR-10. Is it really the case that differentiating cats from airplanes is so much more fragile than predicting race from chest x-rays?
> Models trained on low-pass filtered images maintained high performance even for highly degraded images. More strikingly, models that were trained on high-pass filtered images maintained performance well beyond the point that the degraded images contained no recognisable structures; to the human coauthors and radiologists it was not clear that the image was an x-ray at all.
I tend to not believe unbelievable results in machine learning. It's too easy to unintenionally cause some kind of information leakage. I haven't read the paper in detail though, so their experimentation setup could be foolproof, this is not a critique of this paper specifically.
Curious for the take not of a neuro-ophthalmologist. If they too are stumped, this may be a path to a deeper understanding our visual system.
Simple transformations obviously discernible to us blind computer vision. (CAPTCHAs.) There may be analogs for human vision which don’t present in the natural world. Evidence of such artefacts would partially validate our current path for artificial intelligence, as it suggests the aforementioned failures of our primitive AIs have analogs in our own.
I think there's a significantly greater than zero chance that they simply botched their ML pipeline horribly and would get their 0.98 AUCs from completely blank images.
I think it’s pretty straightforward. Imagine the fourier transforms of some recognizeable audio signals. Maybe a symphony and a traffic jam. They’ll look totally different, even to the naked eye. If you chop off the low frequency components, you can still probably tell which fourier spectrum is which. But now do the same thing in time domain (high-pass filter the audio). It probably won’t be clear that you’re listening to a symphony anymore.
It's a whole field of research, and it's pretty trivial to generate them for most classes of ML models. It's actually quite difficult to create robust models that DON'T have this problem...
It would be nice to see more genuine, enthusiastic scientific curiosity to understand how the ML algorithms are doing this, rather than just abject terror and alarm.
It seems like the reason the researchers in this paper are concerned is precisely that they tried and failed to understand how the ML algorithms are doing this. If they’d discovered that white people have a subtly distinctive vertebra shape the model was detecting, it would have been much more of “oh, we discovered a neat fact”.
"This issue creates an enormous risk for all model deployments in medical imaging: if an AI model relies on its ability to detect racial identity to make medical decisions, but in doing so produced race-specific errors, clinical radiologists would not be able to tell, thereby possibly leading to errors in health-care decision processes."
Why would a model rely on its ability to detect racial identity to make decisions?
Let's say you're trying to train an model to predict if a patient has a cancerous tumor based on some imaging data. You have a data set for this that includes images from people with tumors and people without, from all races. However, unbeknownst to you, most of the images from people of race X had tumors and most of the images from people of race Y did not have tumors.
If the AI is also implicitly learning to detect race from the images, it's going to learn an association that people of race X usually have tumors and people of race Y usually do not.
The problem here is that the people training the model and the clinical radiologists interpreting data from the model may not realize that race was a confounding factor in training, so they'll be unaware that the model may make racial inferences in the real world data.
If people of race X really do have a higher incidence rate for a specific type of cancer than race Y, maybe this is OK. But if the issue is that there was bias in the training/validation data set that was unknown to the people building the model, and in the real world people of race X and race Y have exactly the same incidence rate for this type of cancer, then this is going to be a problem because it's likely to introduce race-specific errors.
Just because the model relies on race in some way doesn’t mean that we know it relies on it. I.e., the model is, unbeknownst to us, biased on race in inaccurate ways.
Using race as an independent factor to make medical decisions isn’t unheard of today. The medical community is largely trying to stop doing that as a matter of social policy, so it’s a problem for that goal if an AI model might be doing it under the hood.
It could actually be the skin, it's designed to block rays, it might also have a different x-ray opacity, and that can be judged from the whole picture in particular where there's several layers of melanin, or there's transitions from melanin to very little like on hands and feet. Eyelids too, if they're retracted. And at the perimeter, the profile, different angle for the ray.
And the intention is for melanin to block x-rays too, block all rays, not just UV but deeper. Well it has a spectrum, that cannot be denied. And if you're taking all the pixels in an image, there might be aggregate effects as I described. You get a few million pixels, let AI use every part of the buffalo of the information of the picture, and you can get skin color through x-rays.
The question is what this says about Africans with light-skin strictly because of albinism, ie lack of pigmentation, but otherwise totally African.
Simply go to google image and search: "skeletal racial differences".
subspecies are found across species-- they happen based on geographic dispersion and geographic isolation, which humans underwent for tens and hundreds of thousands of years.
Welcome to the sciences of anatomy, anthropology, and forensics.
other differences:
- slow twitch vs fast twitch muscle
- teeth shape
- shapes and colors of various parts
- genetic susceptibility to & advantages against specific diseases
Just like Darwin's finches of the Gallapogos, humans faced geographic dispersion resulting in genetic, diet (e.g. hunter-gatherer vs farmer & malnutrition), and geographical (e.g. altitude) differences which over the course of millennia affect anatomical differences. We can see this effect across all biota: bacteria, plants, animals, and yes, humans.
£10 says that its not that. Anatomy is extraordinarily hard, and AI isn't that good, yet. Sure different races have different layouts, but often that's only really obvious post mortem. (ie when you can yank out the bones and look at them, there are of course corner cases where high res CAT/MRI scans can pull out decent skeletal imagery in 3D) There are other cases, but that should be easy to account for.
If I had to bet, and I knew where the data was coming from, I'd say its probably picking up on the style of imaging, rather than anything anatomical. Not all x-rays have bones in, and not all bones differ reliably to detect race.
> keep politics out of science.
Yes, precisely, which is why the experiment needs to be reproduced, and theories tested through experimentation. The reason why this is important is because unless we workout where this trait is coming from, we cannot be sure the diagnosis is correct. For example those with sickle cells have a higher risk of bone damage[1] which could indicate they are x-rayed more. This could warp the dataset, causing false positives for sickle cell style bone damage.
The article is pretty fascinating and I recommend that you actually read it. For example:
>"We found that deep learning models effectively predicted patient race even when the bone density information was removed for both MXR (AUC value for Black patients: 0·960 [CI 0·958–0·963]) and CXP (AUC value for Black patients: 0·945 [CI 0·94–0·949]) datasets. The average pixel thresholds for different tissues did not produce any usable signal to detect race (AUC 0·5). These findings suggest that race information was not localised within the brightest pixels within the image (eg, in the bone)."
The problem with 'race' as a concept isn't that you can genetically tell people apart.
Our tools are so precise you can tell which parent a set of cousins had with DNA tests, this doesn't make them a different species/sub-species or race from each other, even if one group has red hair and the other has black.
It's the pointless lumping together of people who are genetically distinct and drawing arbitrary, unscientific lines that's the issue.
Presumably the same experiments that can detect Asian Vs Black Vs White could also detect the entirely made up 'races' of Asian orBlack, AsianorWhite and WhiteorBlack since those are logically equivalent.
So are the races I made up a moment ago real things? No. But a computer can predict which category I'd assign, doesn't that make them real and important racial classifications? No it means my made up classifications map to other real genetic concepts at a lower level, like red hair.
One idea is that there is some difference in the x-rays themselves that could potentially be explained by racial disparities in access to (and quality of) healthcare. Maybe white people tend to visit hospitals with newer, better equipment or better trained radiographers and the model is picking up on differences in the exposures from that.
> We also showed that the ability of deep models to predict race was generalised across different clinical environments, medical imaging modalities, and patient populations, suggesting that these models do not rely on local idiosyncratic differences in how imaging studies are conducted for patients with different racial identities.
>Race prediction performance was also robust across models trained on single equipment and single hospital location on the chest x-ray and mammogram datasets
Sure, it’s possible that bias due to the radiographer is the culprit, but this seems unlikely.
I mean, if color of skin, form of eyes and other visible, "mechanical" characteristics can be different it's not that big of a leap to observe that certain non-visible characteristics can differ too between humans.
Physiologies are created by genetics, and differences in ancestry are the basis for self-identified race.
Ordinary computer vision can also identify race fairly accurately, the high pass filter thing is merely pointing out that ML classifiers don't work like human retinas.
It's astonishing how many epicycles HN comments are trying to introduce into a finding that anyone would have predicted. Research which confirms predictable things is valuable of course, but no apple carts have been upset.
I would guess a causal chain through environmental factors, given how much archeologists are able to tell about prehisotric humans’ lives based on bone samples.
Bone density, micro fractures and deviations in shape. The mongols had famously had bowed legs from spending a majority of their waking lives on horseback.
I recall seeing a paper in the early 2010s with an algorithm that could discriminate between white and Asian based on head MRI images. I'm having trouble finding it now, but this finding to me is not too surprising.
So there’s material differences that supports certain prejudices; big surprise, turns out human societies have been (and still is) working very hard for thousands of years to craft those differences - isolating, separating, enslaving, oppressing, exiling their scapegoat “others”. The question is not whether the differences are real, but whether we can prevent AI from being used to perpetuate those differences. TBH, we don’t stand a chance; we live in a society where most people cannot even wrap their heads around why it shouldn’t perpetuate those differences.
> Importantly, if used, such models would lead to more patients who are Black and female being *incorrectly* identified as healthy
I think this is the point a lot of people are missing; they think, "So what if 'black' correlates to unhealthy and the model notices? It's just seeing the truth!"
However, I'm still wondering how this incorrectness works; can anyone explain?
Edit: Clue: The AI is predicting self-reported race, and the authors indicated that self-reported race correlates poorly to actual genetic differences.
My guess is that they are using an american dataset. This I would suspect encodes socioeconomic data into the samples. ie rich people, have access to better diagnostics, get seen earlier and are treated sooner. Conversely poorer present later and with more obvious symptoms. also the type of system used to take the images would also be strongly correlated.
If this is true I suspect a human could be trained the same way.
I read once that a radiologist can't always explain what they see in an image that leads them to one diagnosis or another, they say that after seeing many of them they just know.
So I suspect the same could be done for race. This would be a super interesting thing to try with some college students - pay them to train for a few days on images and see how they do.
[+] [-] tech-historian|3 years ago|reply
[+] [-] Animats|3 years ago|reply
[1] https://www.nytimes.com/2021/02/16/opinion/23andme-ancestry-...
[+] [-] MontyCarloHall|3 years ago|reply
I’m more surprised that the distinguishing features haven’t been obvious to trained radiographers for decades. It would be cool to see a followup to this paper that identifies salient distinguishing features. Perhaps a GAN-like model could work—given the trained classifier network, train 1) a second network to generate images that when fed to the classifier, maximize the classification for a given ethnicity, and 2) a third network to discriminate real from fake X-Ray images (to avoid generating noise that happens to minimize the classifier’s loss function). I wonder if the generator would yield images with exaggerated features specific to a given ethnicity, or whether it would yield realistic but uninterpretable images.
[+] [-] eklitzke|3 years ago|reply
[+] [-] hemreldop|3 years ago|reply
[deleted]
[+] [-] uberwindung|3 years ago|reply
Garbage research.
[+] [-] axg11|3 years ago|reply
[+] [-] sudosysgen|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] groby_b|3 years ago|reply
[+] [-] dang|3 years ago|reply
From the guidelines (https://news.ycombinator.com/newsguidelines.html):
"Please use the original title, unless it is misleading or linkbait; don't editorialize."
[+] [-] gus_massa|3 years ago|reply
[+] [-] Imnimo|3 years ago|reply
https://arxiv.org/pdf/2011.06496.pdf
Compare the performance under high pass and low pass filters in this paper on CIFAR-10. Is it really the case that differentiating cats from airplanes is so much more fragile than predicting race from chest x-rays?
[+] [-] jl6|3 years ago|reply
What voodoo have they unearthed?
[+] [-] proto-n|3 years ago|reply
[+] [-] JumpCrisscross|3 years ago|reply
Curious for the take not of a neuro-ophthalmologist. If they too are stumped, this may be a path to a deeper understanding our visual system.
Simple transformations obviously discernible to us blind computer vision. (CAPTCHAs.) There may be analogs for human vision which don’t present in the natural world. Evidence of such artefacts would partially validate our current path for artificial intelligence, as it suggests the aforementioned failures of our primitive AIs have analogs in our own.
[+] [-] civilized|3 years ago|reply
[+] [-] 6gvONxR4sf7o|3 years ago|reply
[+] [-] Der_Einzige|3 years ago|reply
It's a whole field of research, and it's pretty trivial to generate them for most classes of ML models. It's actually quite difficult to create robust models that DON'T have this problem...
[+] [-] tomp|3 years ago|reply
https://en.wikipedia.org/wiki/Chick_sexing#Vent_sexing
[+] [-] civilized|3 years ago|reply
[+] [-] SpicyLemonZest|3 years ago|reply
[+] [-] tejohnso|3 years ago|reply
Why would a model rely on its ability to detect racial identity to make decisions?
What kind of errors are race-specific?
[+] [-] eklitzke|3 years ago|reply
If the AI is also implicitly learning to detect race from the images, it's going to learn an association that people of race X usually have tumors and people of race Y usually do not.
The problem here is that the people training the model and the clinical radiologists interpreting data from the model may not realize that race was a confounding factor in training, so they'll be unaware that the model may make racial inferences in the real world data.
If people of race X really do have a higher incidence rate for a specific type of cancer than race Y, maybe this is OK. But if the issue is that there was bias in the training/validation data set that was unknown to the people building the model, and in the real world people of race X and race Y have exactly the same incidence rate for this type of cancer, then this is going to be a problem because it's likely to introduce race-specific errors.
[+] [-] amarshall|3 years ago|reply
[+] [-] SpicyLemonZest|3 years ago|reply
See e.g. https://www.ucsf.edu/news/2021/09/421466/new-kidney-function...
[+] [-] matthewdgreen|3 years ago|reply
[+] [-] daniel-cussen|3 years ago|reply
And the intention is for melanin to block x-rays too, block all rays, not just UV but deeper. Well it has a spectrum, that cannot be denied. And if you're taking all the pixels in an image, there might be aggregate effects as I described. You get a few million pixels, let AI use every part of the buffalo of the information of the picture, and you can get skin color through x-rays.
The question is what this says about Africans with light-skin strictly because of albinism, ie lack of pigmentation, but otherwise totally African.
[+] [-] mensetmanusman|3 years ago|reply
[+] [-] hellohowareu|3 years ago|reply
subspecies are found across species-- they happen based on geographic dispersion and geographic isolation, which humans underwent for tens and hundreds of thousands of years.
Welcome to the sciences of anatomy, anthropology, and forensics.
other differences:
- slow twitch vs fast twitch muscle
- teeth shape
- shapes and colors of various parts
- genetic susceptibility to & advantages against specific diseases
Just like Darwin's finches of the Gallapogos, humans faced geographic dispersion resulting in genetic, diet (e.g. hunter-gatherer vs farmer & malnutrition), and geographical (e.g. altitude) differences which over the course of millennia affect anatomical differences. We can see this effect across all biota: bacteria, plants, animals, and yes, humans.
help keep politics out of science.
[+] [-] KaiserPro|3 years ago|reply
£10 says that its not that. Anatomy is extraordinarily hard, and AI isn't that good, yet. Sure different races have different layouts, but often that's only really obvious post mortem. (ie when you can yank out the bones and look at them, there are of course corner cases where high res CAT/MRI scans can pull out decent skeletal imagery in 3D) There are other cases, but that should be easy to account for.
If I had to bet, and I knew where the data was coming from, I'd say its probably picking up on the style of imaging, rather than anything anatomical. Not all x-rays have bones in, and not all bones differ reliably to detect race.
> keep politics out of science.
Yes, precisely, which is why the experiment needs to be reproduced, and theories tested through experimentation. The reason why this is important is because unless we workout where this trait is coming from, we cannot be sure the diagnosis is correct. For example those with sickle cells have a higher risk of bone damage[1] which could indicate they are x-rayed more. This could warp the dataset, causing false positives for sickle cell style bone damage.
[1]https://www.hopkinsmedicine.org/health/conditions-and-diseas...
[+] [-] airza|3 years ago|reply
>"We found that deep learning models effectively predicted patient race even when the bone density information was removed for both MXR (AUC value for Black patients: 0·960 [CI 0·958–0·963]) and CXP (AUC value for Black patients: 0·945 [CI 0·94–0·949]) datasets. The average pixel thresholds for different tissues did not produce any usable signal to detect race (AUC 0·5). These findings suggest that race information was not localised within the brightest pixels within the image (eg, in the bone)."
[+] [-] ZeroGravitas|3 years ago|reply
Our tools are so precise you can tell which parent a set of cousins had with DNA tests, this doesn't make them a different species/sub-species or race from each other, even if one group has red hair and the other has black.
It's the pointless lumping together of people who are genetically distinct and drawing arbitrary, unscientific lines that's the issue.
Presumably the same experiments that can detect Asian Vs Black Vs White could also detect the entirely made up 'races' of Asian orBlack, AsianorWhite and WhiteorBlack since those are logically equivalent.
So are the races I made up a moment ago real things? No. But a computer can predict which category I'd assign, doesn't that make them real and important racial classifications? No it means my made up classifications map to other real genetic concepts at a lower level, like red hair.
[+] [-] nerdponx|3 years ago|reply
[+] [-] scandox|3 years ago|reply
[deleted]
[+] [-] bb123|3 years ago|reply
[+] [-] krona|3 years ago|reply
[+] [-] MontyCarloHall|3 years ago|reply
>Race prediction performance was also robust across models trained on single equipment and single hospital location on the chest x-ray and mammogram datasets
Sure, it’s possible that bias due to the radiographer is the culprit, but this seems unlikely.
[+] [-] Beltiras|3 years ago|reply
[+] [-] redox99|3 years ago|reply
[+] [-] mathieubordere|3 years ago|reply
[+] [-] samatman|3 years ago|reply
Ordinary computer vision can also identify race fairly accurately, the high pass filter thing is merely pointing out that ML classifiers don't work like human retinas.
It's astonishing how many epicycles HN comments are trying to introduce into a finding that anyone would have predicted. Research which confirms predictable things is valuable of course, but no apple carts have been upset.
[+] [-] bitcurious|3 years ago|reply
Bone density, micro fractures and deviations in shape. The mongols had famously had bowed legs from spending a majority of their waking lives on horseback.
[+] [-] oaktrout|3 years ago|reply
[+] [-] ppqqrr|3 years ago|reply
[+] [-] kerblang|3 years ago|reply
I think this is the point a lot of people are missing; they think, "So what if 'black' correlates to unhealthy and the model notices? It's just seeing the truth!"
However, I'm still wondering how this incorrectness works; can anyone explain?
Edit: Clue: The AI is predicting self-reported race, and the authors indicated that self-reported race correlates poorly to actual genetic differences.
[+] [-] KaiserPro|3 years ago|reply
[+] [-] ars|3 years ago|reply
I read once that a radiologist can't always explain what they see in an image that leads them to one diagnosis or another, they say that after seeing many of them they just know.
So I suspect the same could be done for race. This would be a super interesting thing to try with some college students - pay them to train for a few days on images and see how they do.