The eye opening thing here is not that the AI failed, but why it failed.
At start the AI is like a baby, it doesn't know anything or have any opinions. By teaching it using a set of data, in this case a set of resumes and the outcome then it can form an opinion.
The AI becoming biased tells that the "teacher" was biased also. So actually Amazon's recruiting process seems to be a mess with the technical skills on the resume amounting to zilch, gender and the aggressiveness of the resume's language being the most important (because that's how the human recruiters actually hired people when someone put a resume).
The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates). What matters is the rejection rate which it learned from the data.. The hiring process is inherently biased against women.
Technically one could say that the AI was successful because it emulated the current Amazon hiring status.
> The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates).
This is incorrect. The key thing to keep in mind is that they are not just predicting who is a good candidate, they are also ranking by the certainty of their prediction.
Lower numbers of female candidates could plausibly lead to lower certainty for the prediction model as it would have less data on those people. I've never trained a model on resumes, but I definitely often see this "lower certainty on minorites" thing for models I do train.
The lower certainty would in turn lead to lower rankings for women even without any bias in the data.
Now, I'm not saying that Amazon's data isn't biased. I would not be surprised if it were. I'm just saying we should be careful in understanding what is evidence of bias and what is not.
The article didn't specify how they labeled resumes for training. You're assuming that it was based on whether or not the candidate was hire. Nobody with an iota of experience in machine learning would do something like that. (For obvious reasons: you can't tell from your data whether people you did not hire were truly bad.)
A far more reasonable way would be to take resumes of people who were hired and train the model based on their performance. For example, you could rate resumes of people who promptly quit or got fired as less attractive than resumes of people who stayed with the company for a long time. You could also factor in performance reviews.
It is entirely possible that such model would search for people who aren't usually preferred. E.g. if your recruiters are biased against Ph.D.'s, but you have some Ph.D.'s and they're highly productive, the algorithm could pick this up and rate Ph.D. resumes higher.
Now, you still wouldn't know anything about people whom you didn't hire. This means there is some possibility your employees are not representative of general population and your model would be biased because of that.
Let's say your recruiters are biased against Ph.D.'s and so they undergo extra scrutiny. You only hire candidates with a doctoral degree if they are amazing. This means within your company a doctoral degree is a good predictor of success, but in the world at large it could be a bad criteria to use.
This doesn’t seem to be a reasonable conclusion. There is no reason to assume the AI’s assessment methods will mirror those of the recruiters. If Amazon did most of it’s hiring when programming was a task primarily performed by men, and so Amazon didn’t receive many female applicants, they could be unbiased while still amassing a data set that skewed heavily male. The machine would then just correctly assess that female resumes don’t match, as closely, the resumes of successful past candidates. Perhaps I’m ignorant about AI, but I don’t see why the number of candidates of each gender shouldn’t increase the strength of the signal. “Aggressiveness” in the resume may be correlated but not causal. If the AI was fed the heights of the candidates, it might reject women for being too short, but that would not indicate height is a criteria of Amazon recruiters hiring.
Do you have some information not present in the article? There seem to be some assumptions on the training process in your comment that are not sourced in the article.
I'll don my flack jacket for this one, but based on population statistics I believe a statistically significant number of women have children. A plausible hypothesis is that a typical female candidate is at a 9 month disadvantage against male employees and that that is a statistically significant effect detected by this Amazon tool.
Now, the article says that the results of the tool were 'nearly random', so that probably wasn't the issue. But just because the result of a machine learning process is biased does not indicate that the teacher is biased. It indicates that the data is biased, and bias always has a chance to be linked to real-world phenomenon.
The term "AI" is over-hyped. What we have now is advanced pattern recognition, not intelligence.
Pattern recognition will learn any biases in your training data. An intelligent enough* being does much more than pattern recognition -- intelligent beings have concepts of ethics, social responsibility, value systems, dreams, ideals, and is able to know what to look for and what to ignore in the process of learning.
A dumb pattern recognition algorithm aims to maximize its correctness. Gradient descent does exactly that. It wants to be correct as much of the time as possible. An intelligent enough being, on the other hand, has at least an idea of de-prioritizing mathematical correctness and putting ethics first.
Deep learning in its current state is emphatically NOT what I would call "intelligence" in that respect.
Google had a big media blooper when their algorithm mistakenly recognized a black person as a gorilla [0]. The fundamental problem here is that state-of-the-art machine learning is not intelligent enough. It sees dark-colored pixels with a face and goes "oh, gorilla". Nothing else. The very fact that people were offended by that is a sign that people are truly intelligent. The fact that the algorithm didn't even know it was offending people is a sign that the algorithm is stupid. Emotions, the ability to be offended, and the ability to understand what offends others, are all products of true intelligence.
If you used today's state-of-the-art machine learning, fed it real data from today's world, and asked it to classify them into [good people, criminals, terrorists], you would result in an algorithm that labels all black people as criminals and all people with black hair and beards as terrorists. The algorithm might even be the most mathematically correct model. The very fact that you (I sincerely hope) cringe at the above is a sign that YOU are intelligent and this algorithm is stupid.
*People are overall intelligent, and some people behave more intelligently than others. There are members of society that do unintelligent things, like stereotyping, over-generalization, and prejudice, and others who don't.
Hold on here. This article seems to have buried a pretty important piece of information wayyy down in the middle of the text.
> Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.
Granted an article isn't going to get as much attention without an attractive headline but that seems a far more likely reason to have an AI based recruiting recommendation scrapped. The discovery of a negative weight associated with "women's" or graduates of two unnamed women's colleges is notable but if it's tossing out results "almost at random" then...well there seems to be bigger problems?
(Disclaimer: I am an Amazon employee sharing his own experience, but do not speak in any official capacity for Amazon. I don't know anything about the system mentioned in this article.)
I am a frequent interviewer for engineering roles at Amazon. As part of the interview training and other forums, we often discuss the importance of removing bias, looking out for unconscious bias, and so on. The recruiters I know at Amazon all take reaching out to historically under-represented groups seriously.
I don't know anything about the system described in the article (even that we had such a system), but if it was introducing bias I'm glad it's being shelved. Hopefully this article doesn't discourage people from applying to work at Amazon - I've found it a good place to work.
To say something about the AI/ML aspect of the article: I think as engineers our instinct is "Here's some data that's been classified for me, I can use ML/AI on it!" without thinking through all that follows, including doing quality assurance. I think a lot of focus in ML (at least in what I've read) has been on generating models, and not nearly enough focus has been on generating models that interpretable (i.e., give a reason along with a classification).
This will be unpopular but I don't care. What is the evidence that the source data for this 'AI' is biased because the men it came from did not want to hire women? Is there a reserve of unemployed non-male engineers out there? If so what evidence is there of that?
Technical talent is both expensive and a rare commodity for tech companies. The non-male engineers I've worked with have always been exceedingly competent, smart, and their differing perspectives invaluable. If there was an untapped market of engineers you'd better believe every tech company would be taking advantage of it.
Yeah - I'm not even expressing an unpopular opinion, just asking a (leading) question: where are all these women who are chomping at the bit to get into _technical_ positions like programming but find themselves being turned away by biased recruiters? I've never even seen somebody _claim_ that they were a woman who couldn't find a tech job, just people wondering where all the women were.
Oh, there are a bunch of us, even here in the SF Bay Area. Trouble is, we're older than 35, or don't have degrees from "top" schools, and/or don't have the "passion" for bizarre extended hiring rituals. I could staff an entire dev team with non-male people within a week.
I agree with you, but I have to point out, because it's so common: The "this will be unpopular, but I don't care" preface is, I feel, about as damaging to the perception of whatever you're about to say as "I'm not racist, but". To make an effective point, I think you should avoid, as the first thing you say, painting yourself as an underdog brave enough to speak out by preemptively criticising your audience's reactions that haven't even happened yet. That's not to say you should never admit those observations - rather, I would reserve such broad criticism of people's opinions for a separate train of thought or conversation.
> What is the evidence that the source data for this 'AI' is biased because the men it came from did not want to hire women?
One issue that keeps happening is an over-emphasis on CS-related questions. There are many great engineers I've worked with who didn't do a CS degree, and even though they are brilliant thinkers and talented engineers, too many times the interview question is "solve this problem using <pet CS 101 lesson, like red-black trees>".
And the number of people who are hired who can barely communicate effectively is still shocking. Very few interview questions focus on communication outside the technical realm.
So you can argue there is a bias in recruiting, simply because different people have different criteria for what the best traits/skills to look for is - even though everybody has the same goal, hiring the "best".
I'd also caution about taking Reuters too seriously though. Seems that they've only focused on the gender issue, but this is the money quote:
> With the technology returning results almost at random, Amazon shut down the project, they said.
"Untapped market of engineers" yes it exists. The majority of my female friends with STEM degrees ended up as high school teachers. I had several older people suggest to me that teaching should be my preferred career choice because it was more flexible than a programming job (wtf...)
"every tech company would be taking advantage of it" - nope, no one is. I don't know why but my guess is its hard to admit you're doing hiring wrong, hard to hire people who think differently than you, etc.
1. The issue is certainly bigger than hiring. In the many years between birth and looking for a job, there are a lot of societal pressures that will impact what eventual careers people end up in.
2. Hiring managers are people. They are not perfect. They have biases. If someone expects an engineer to look, talk, and act a certain way, that can impact their decision making completely independent of the fact that they want to hire the best people for their company.
Bonus third point: I still see a whole lot of "We want to make sure that the hire fits on the team." This is completely natural, and comes with its own set of built-in biases.
1. There's no reason to expect that these women will be unemployed - they just won't be working for Amazon. That's all we know. No point going looking for them.
2. You can't assign intent to hiring decisions made in the training data - there's no reason to believe that men (and why single them out?) "did not want to hire women". Maybe they did. Maybe they have no idea that they're biased - maybe the women making such hiring decisions are just as biased. We have no idea.
3. The evidence that the AI is biased, is that.... the AI is biased. Which means that the training data is biased. Why that is, is a great question - it may reflect unconscious bias in the hiring process, or more obvious old-fashioned biases. It may reflect that the model amplifies some minor bias in the training data and turns it into something much bigger. We don't know.
"The non-male engineers I've worked with have always been exceedingly competent, smart, and their differing perspectives invaluable" - that is really an (anecdotal) evidence that there is a bias indeed. If the recruiting was all unbiased than the quality of existing male and female force would be the same - if female workers are of higher quality than it means that they need to pass higher requirements.
An untapped market of engineers would be tapped...
...if and only if...
...there were no other factors at play that cause that market to remain untapped.
For further rational thinking, consider this. If there's a bias, it doesn't mean women won't get hired. It just means they won't get hired for the best positions. Everyone else gets Amazon's cast-offs.
Biases are rarely knowingly done, that's why they're biases. The proof is in the pudding though. The model Amazon came up with was biased against women. That suggests that female candidates in the dataset were discriminated against.
That is very interesting. What are the stats for unemployed female engineers? Is there a shortage of female representation because female engineers aren’t being hired or is there a shortage because there are actually fewer female engineers?
There is a shortage of male therapists and kindergarten teachers: is that because males aren’t being hired or because there are fewer of them in existence?
"What is the evidence that the source data for this 'AI' is biased because the men it came from did not want to hire women?"
Nobody is saying that the biases was caused because it was created by men who didn't want to hire women. That's a fear-mongering straw man.
What people are saying is that there was bias in the training data selected, and so the algorithm exacerbated that bias. Thus, being a cautionary tale about the training data you feed to these things.
" If there was an untapped market of engineers you'd better believe every tech company would be taking advantage of it."
You're assuming rationality where there really is no cause to do so.
This is a direct and clear example of bias which made it easy to flag the ML algorithm. But what about ML algorithms that are inducing benefits to groups in less obvious contexts? What about groups that are not so easily identified as being protected classes by simple, human-understandable model features? What about cases where the features are just merely correlated with a subpopulation of a protected class?
If we're being honest, a system only needs to be in a decision-making capacity for discriminatory behavior to be scrutinized, since in many cases human operators will not be able to identify the specific features being used to make decisions about people -- the features could be highly correlated with some subpopulation of protected class. If you take that to be true, the question reduces onto what decision-making roles ML algorithms have that could be discriminatory, and it's hard to argue this is not a massive part of their current and expected roles.
I think this is going to be a long, winding ethical nightmare that is probably just getting started by human-digestible examples such as these. One can imagine things like this one being looked back on as quaint in the naivety to which we assume we can understand these systems. Where do we draw the line, and how much control do we give up to an optimization function? Surely there is a balance -- how do we categorize and made good decisions around this?
As far as I know, a cohesive ethical framework around this is pretty much non-existent -- the current regime is simply "someone speaks up when something absurdly and overtly bad happens."
> What about cases where the features just merely correlated with a subpopulation of a protected class?
This is just Simpson's paradox [1] which is notoriously hard to identify because you have to compare the overall with the breakdown. As you say, current-AI probably already has such biases.
> What about cases where the features are just merely correlated with a subpopulation of a protected class?
This question can be rephrased as "is there a difference between de facto and de jure discrimination?"
My answer is no, causality doesn't matter here: if feature A is a good predictor that some person belongs in group B and not group C, then filtering out feature As is effectively the same as filtering out only group Bs.
> the features are just merely correlated with a subpopulation of a protected class
The article notes that Amazon's system rated down grads from two all-women's schools. But it immediately occurs to me to wonder what the algorithm did with candidates from heavily gender-imbalanced schools, which could be much harder to spot.
RPI's Computer Science department is about 85% male, while CMU's is just over 50% male. CMU's CS department is also considered one of the best in the world, and presumably any functional algorithm that cared about alma mater would respond to that. So if the bias ends up being "because of CMU's gender ratio, CMU grads with gender-unclear resumes are advantaged slightly less than otherwise would be", how on earth would someone spot that?
Once you're looking for it, you could potentially retrain with some data set like "RPI resumes, but we adjusted their gendered-words rate" and see if you get a different outcome on your test set. But that's both a labor intensive task, and one that's only approachable once you already know what you're looking for. And even if you do see a change, you'd still have to tease it out from a dozen other hypotheses like "certain schools have more organizations with gendered names, and the algorithm can't tell that those organizations are a proxy for school".
Of course, the counterpoint is that human decisions can't be scrutinized any better, and it's not entirely clear they're less arbitrary or more ethical. At a certain point algorithmic approaches are being scrutinized because they're slightly transparent and testable, so running them on a range of counterfactuals or breaking down their choices is hard rather than impossible. I suspect that's true, but it doesn't really comfort me - humans at least tend to misbehave along certain predictable axes we can try to mitigate, while ML systems can blindside us with all sorts of new and unexpected forms of badness.
All of that is true, but I think the most important question is: compared to what? ML is substantially more transparent than human decision makers. Human decision makers will actively lie to you. ML is a major step forward in correcting these sorts of biases, by making interpretable (relative to humans) models in the first place.
At best ai amplifies existing patterns and biases when handling repetitive work. Over and over we hear how Facebook, Twitter, Google, and others will solve the problem of problematic content and bad actors through ai and neural networks. It's a fraud and the digital potemkin village of our era.
AI learns from the training data it's given and copies any biases this data exhibits.
Pretty much all software today uses ML in some form to improve their services. I feel it's here to stay and not bad by default. We just have to make sure we are aware of its current limitations.
Facebook is already auto-flagging content this way but it's just a very hard problem (even for humans).
Call me cynical but I found it amusing that no one points out the fact that engineering work is laborious and dry to say the least for most people. That's the reason why there are so few people who have other options, females, upper middle class people, people of means, in the scene.
There are so many lowish paid low status unsought for sectors where majority workers are male, say janitors in Universities, why I never see any discussions on that bias ?
The explanation seems overly simplistic. If the difference in volume of male candidates mattered, then I would also expect to see a bias in favor of applicants from larger universities. That seems like too obvious an issue in the way the algorithm was designed.
I see four possibilities here:
1. The algorithm was designed in a completely inept fashion
2. The algorithm design was sound, but ultimately ineffective
3. The algorithm was sound and effective, but results were considered discriminatory.
4. There's something biased about how employees are rated--the data that would feed into the algorithm, which is possibly more of a human element.
I wish we could move away from resumes for tech role screening anyway, since they convey very little real reliable information. I’ve seen too many great hires from candidates with relatively weak resumes, and failed interviews from candidates with great resumes (and obviously vice versa).
I’m not sure what the best alternative should be, though. I am a fan of open source work as a sort of code portfolio, but it doesn’t work for every kind of engineering/science (edit: and also would introduce bias against professionals too busy for open source.)
Regarding bias — it seems the only way to truly eliminate it (including unconscious bias) is author-blind reviews, i.e. reviewing code written by a candidate without knowing anything about that candidate’s identity. (And the nice thing about code is it usually doesn’t signal any identity traits of the author via side channels.)
Unfortunately using open source work, even if only for programming, introduces all sorts of biases as well. A lot of very competent programmers work at jobs that do not have open source contributions and also have families which limit the time they can spend coding after work.
Should I ever be in a position to hire a colleague, I wouldn't ever do so without having a chat with them.
I spend 8hrs a day in an office with my colleagues (sometimes more than with my wife & kid) and the ones I can't stand is about the only thing wrong with my job.
If we can't even see the person's face over some gender bias hysteria then I wonder how the hell we got here.
People should just get over the fact that men and women are different.
I agree that resumes are a poor means to distinguish between good and bad candidates. Humans already struggle with the screening process. There's no way an AI can reveal some kind of hidden secret sauce written into all great candidate resumes. This project was doomed to fail from the beginning, in my opinion.
The idea that Amazon is trying to enforce diversity by using an algorithm that is made to detect, and match, patterns boggles the mind. Why, yes, if your recruiting cost function is "is that person just like all the others we hired", you will end-up with a non-diverse workforce, no matter whether the model optimized with this function has 20 layers, 250 hyperparameters, or two legs, two arms and a fast-receding hairline.
You can de-bias by explicitly controlling for gender, but now everyone in your company went to CMU and likes dogs.
The more I see news about what recruiting for ultra-large corporations, the more I think one of two things is true:
* ultra-large corporations are doomed to hire less and less well in a way that is more and more biaised, and we should regulate against such corporations in a way that forces them to redistribute their wealth to SMBs;
* ultra-large corporations need to start exclusively growing through acquisitions, which will have the effect of redistributing their wealth to SMBs, and also of hiring a more diverse base of employees because there is a priori a greater diversity of backgrounds leading to success in the free market than the diversity of backgrounds leading to success in the Amazon interview.
This is simply silly. The reason why there's a bias toward hiring women in some roles at some corporations is because they are trying to course correct for the massive amount of systemic bias pushing the other way. For some reason many people (mostly men) seem to easily spot the one type of bias while never being able to see the other.
The system failed because they were trying to solve the wrong problem, or maybe more specifically, didn't solve the problem that led to the problems with the AI. Amazon was treating the hiring problem as an efficiency problem alone, and ignoring the bias problem. So they wound up training the AI to do a shitty job much faster than humans ever could be shitty - and, by analyzing the data in a way the human results weren't analyzed, showed the failings of the human hiring process.
Existing process is sexist. Automate to "improve" it, and you wind up with something even more sexist. What this means is that Amazon needs to go back and revamp their whole hiring process to make it fair, before trying to make it faster.
I would guess that the training data for the ML set was the set of all resumes and an indicator of whether the candidate was eventually hired (maybe with supplemental data about how far in the process the candidate got).
Could this be a direct indicator of a powerful subconscious bias in Amazon's existing hiring process?
I'm reminded of why Watson failed, and the problem with ml and ai in general- you can't peek under the hood to see why something happened, or how to keep it from happening without a lot of time, a lot of hard work, and a whole lot of carefully groomed data.
The problem is you cant feed the ML algorithm training data based on what your company currently looks like, you have to feed it an idealized set of what you want it to look like. It almost needs to be fictitious training data to hide the ugly bias that's already built in.
I don't think this will ever work. There is too much variability in resume wording that correlates to gender and even culture of origin even when you take out names and any other protected class identifying markers. The Dutch tried this and ended up with less diversity.
I'm going to go out on a limb and say you almost want to leave all that identifying data in, but put each candidate into buckets with separate rating algorithms trained against only that "type" of candidate. The top candidates from each culture, and the top candidates from each gender, etc etc, however you want to do it. Feed them into a picking algorithm that builds a composite of what you want your team to look like diversity wise based on the top candidates from each bucket, and go from there.
Don't take my opinion seriously, I'm not an ML guy.
It seems that the only socially acceptable output for the AI would have been hiring women 50% or more and hiring minorities at a rate greater than or equal to their representation in the populations. Anything else is clear bias and discrimination.
I tried to do something similar a while ago (for eng hiring specifically). It turned out that the number of grammatical errors and typos mattered way more than anything else on a resume.
That aside, what sucks is that attempts to automate resume scoring rarely look at harder-to-quantify features and focus on low-hanging fruit like keyword occurrences... though in my experience it's such a low-signal document for engineering hiring that the whole thing is a fool's errand.
This is not very surprising - Machine Learning algorithms trained on biased datasets tend to pick up the hidden biases in the training data. It’s important that we be transparent about the training data that we are using, and are looking for hidden biases in it, otherwise we are building biased systems. Fortunately, there are open source tools out there that help audit machine learning models for bias, such as Audit AI, released by pymetrics - https://github.com/pymetrics/audit-ai
I hate this industry. Shooting themselves in the foot over and over again because no one can get passed the idea that possibly, women can be just as good at math, logic and computer science - if people would just let them. This never ends. It's just one place after another, when it gets discovered. It never changes.
[+] [-] fuscy|7 years ago|reply
At start the AI is like a baby, it doesn't know anything or have any opinions. By teaching it using a set of data, in this case a set of resumes and the outcome then it can form an opinion.
The AI becoming biased tells that the "teacher" was biased also. So actually Amazon's recruiting process seems to be a mess with the technical skills on the resume amounting to zilch, gender and the aggressiveness of the resume's language being the most important (because that's how the human recruiters actually hired people when someone put a resume).
The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates). What matters is the rejection rate which it learned from the data.. The hiring process is inherently biased against women.
Technically one could say that the AI was successful because it emulated the current Amazon hiring status.
[+] [-] lalaland1125|7 years ago|reply
This is incorrect. The key thing to keep in mind is that they are not just predicting who is a good candidate, they are also ranking by the certainty of their prediction.
Lower numbers of female candidates could plausibly lead to lower certainty for the prediction model as it would have less data on those people. I've never trained a model on resumes, but I definitely often see this "lower certainty on minorites" thing for models I do train.
The lower certainty would in turn lead to lower rankings for women even without any bias in the data.
Now, I'm not saying that Amazon's data isn't biased. I would not be surprised if it were. I'm just saying we should be careful in understanding what is evidence of bias and what is not.
[+] [-] gambler|7 years ago|reply
A far more reasonable way would be to take resumes of people who were hired and train the model based on their performance. For example, you could rate resumes of people who promptly quit or got fired as less attractive than resumes of people who stayed with the company for a long time. You could also factor in performance reviews.
It is entirely possible that such model would search for people who aren't usually preferred. E.g. if your recruiters are biased against Ph.D.'s, but you have some Ph.D.'s and they're highly productive, the algorithm could pick this up and rate Ph.D. resumes higher.
Now, you still wouldn't know anything about people whom you didn't hire. This means there is some possibility your employees are not representative of general population and your model would be biased because of that.
Let's say your recruiters are biased against Ph.D.'s and so they undergo extra scrutiny. You only hire candidates with a doctoral degree if they are amazing. This means within your company a doctoral degree is a good predictor of success, but in the world at large it could be a bad criteria to use.
[+] [-] kareemsabri|7 years ago|reply
[+] [-] roenxi|7 years ago|reply
I'll don my flack jacket for this one, but based on population statistics I believe a statistically significant number of women have children. A plausible hypothesis is that a typical female candidate is at a 9 month disadvantage against male employees and that that is a statistically significant effect detected by this Amazon tool.
Now, the article says that the results of the tool were 'nearly random', so that probably wasn't the issue. But just because the result of a machine learning process is biased does not indicate that the teacher is biased. It indicates that the data is biased, and bias always has a chance to be linked to real-world phenomenon.
[+] [-] dheera|7 years ago|reply
Pattern recognition will learn any biases in your training data. An intelligent enough* being does much more than pattern recognition -- intelligent beings have concepts of ethics, social responsibility, value systems, dreams, ideals, and is able to know what to look for and what to ignore in the process of learning.
A dumb pattern recognition algorithm aims to maximize its correctness. Gradient descent does exactly that. It wants to be correct as much of the time as possible. An intelligent enough being, on the other hand, has at least an idea of de-prioritizing mathematical correctness and putting ethics first.
Deep learning in its current state is emphatically NOT what I would call "intelligence" in that respect.
Google had a big media blooper when their algorithm mistakenly recognized a black person as a gorilla [0]. The fundamental problem here is that state-of-the-art machine learning is not intelligent enough. It sees dark-colored pixels with a face and goes "oh, gorilla". Nothing else. The very fact that people were offended by that is a sign that people are truly intelligent. The fact that the algorithm didn't even know it was offending people is a sign that the algorithm is stupid. Emotions, the ability to be offended, and the ability to understand what offends others, are all products of true intelligence.
If you used today's state-of-the-art machine learning, fed it real data from today's world, and asked it to classify them into [good people, criminals, terrorists], you would result in an algorithm that labels all black people as criminals and all people with black hair and beards as terrorists. The algorithm might even be the most mathematically correct model. The very fact that you (I sincerely hope) cringe at the above is a sign that YOU are intelligent and this algorithm is stupid.
*People are overall intelligent, and some people behave more intelligently than others. There are members of society that do unintelligent things, like stereotyping, over-generalization, and prejudice, and others who don't.
[0] https://www.theverge.com/2018/1/12/16882408/google-racist-go...
[+] [-] HashHishBang|7 years ago|reply
> Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.
Granted an article isn't going to get as much attention without an attractive headline but that seems a far more likely reason to have an AI based recruiting recommendation scrapped. The discovery of a negative weight associated with "women's" or graduates of two unnamed women's colleges is notable but if it's tossing out results "almost at random" then...well there seems to be bigger problems?
[+] [-] jkingsbery|7 years ago|reply
I am a frequent interviewer for engineering roles at Amazon. As part of the interview training and other forums, we often discuss the importance of removing bias, looking out for unconscious bias, and so on. The recruiters I know at Amazon all take reaching out to historically under-represented groups seriously.
I don't know anything about the system described in the article (even that we had such a system), but if it was introducing bias I'm glad it's being shelved. Hopefully this article doesn't discourage people from applying to work at Amazon - I've found it a good place to work.
To say something about the AI/ML aspect of the article: I think as engineers our instinct is "Here's some data that's been classified for me, I can use ML/AI on it!" without thinking through all that follows, including doing quality assurance. I think a lot of focus in ML (at least in what I've read) has been on generating models, and not nearly enough focus has been on generating models that interpretable (i.e., give a reason along with a classification).
[+] [-] macinjosh|7 years ago|reply
Technical talent is both expensive and a rare commodity for tech companies. The non-male engineers I've worked with have always been exceedingly competent, smart, and their differing perspectives invaluable. If there was an untapped market of engineers you'd better believe every tech company would be taking advantage of it.
[+] [-] commandlinefan|7 years ago|reply
[+] [-] petsormeat|7 years ago|reply
[+] [-] happytoexplain|7 years ago|reply
[+] [-] guitarbill|7 years ago|reply
One issue that keeps happening is an over-emphasis on CS-related questions. There are many great engineers I've worked with who didn't do a CS degree, and even though they are brilliant thinkers and talented engineers, too many times the interview question is "solve this problem using <pet CS 101 lesson, like red-black trees>".
And the number of people who are hired who can barely communicate effectively is still shocking. Very few interview questions focus on communication outside the technical realm.
So you can argue there is a bias in recruiting, simply because different people have different criteria for what the best traits/skills to look for is - even though everybody has the same goal, hiring the "best".
I'd also caution about taking Reuters too seriously though. Seems that they've only focused on the gender issue, but this is the money quote:
> With the technology returning results almost at random, Amazon shut down the project, they said.
[+] [-] kat|7 years ago|reply
"every tech company would be taking advantage of it" - nope, no one is. I don't know why but my guess is its hard to admit you're doing hiring wrong, hard to hire people who think differently than you, etc.
[+] [-] amanaplanacanal|7 years ago|reply
1. The issue is certainly bigger than hiring. In the many years between birth and looking for a job, there are a lot of societal pressures that will impact what eventual careers people end up in.
2. Hiring managers are people. They are not perfect. They have biases. If someone expects an engineer to look, talk, and act a certain way, that can impact their decision making completely independent of the fact that they want to hire the best people for their company.
Bonus third point: I still see a whole lot of "We want to make sure that the hire fits on the team." This is completely natural, and comes with its own set of built-in biases.
[+] [-] jahewson|7 years ago|reply
1. There's no reason to expect that these women will be unemployed - they just won't be working for Amazon. That's all we know. No point going looking for them.
2. You can't assign intent to hiring decisions made in the training data - there's no reason to believe that men (and why single them out?) "did not want to hire women". Maybe they did. Maybe they have no idea that they're biased - maybe the women making such hiring decisions are just as biased. We have no idea.
3. The evidence that the AI is biased, is that.... the AI is biased. Which means that the training data is biased. Why that is, is a great question - it may reflect unconscious bias in the hiring process, or more obvious old-fashioned biases. It may reflect that the model amplifies some minor bias in the training data and turns it into something much bigger. We don't know.
So yeah, it's biased - the question is why.
[+] [-] zby|7 years ago|reply
[+] [-] beat|7 years ago|reply
...if and only if...
...there were no other factors at play that cause that market to remain untapped.
For further rational thinking, consider this. If there's a bias, it doesn't mean women won't get hired. It just means they won't get hired for the best positions. Everyone else gets Amazon's cast-offs.
[+] [-] Marazan|7 years ago|reply
That's exactly the same argument used to justify every regressive policy. If x was true then rational y action would happen.
But that's the poo tof racism and sexism rational y action doesn't happen due to the -ism.
[+] [-] rrcaptain|7 years ago|reply
[+] [-] briandear|7 years ago|reply
There is a shortage of male therapists and kindergarten teachers: is that because males aren’t being hired or because there are fewer of them in existence?
[+] [-] s73v3r_|7 years ago|reply
Nobody is saying that the biases was caused because it was created by men who didn't want to hire women. That's a fear-mongering straw man.
What people are saying is that there was bias in the training data selected, and so the algorithm exacerbated that bias. Thus, being a cautionary tale about the training data you feed to these things.
" If there was an untapped market of engineers you'd better believe every tech company would be taking advantage of it."
You're assuming rationality where there really is no cause to do so.
[+] [-] gfodor|7 years ago|reply
If we're being honest, a system only needs to be in a decision-making capacity for discriminatory behavior to be scrutinized, since in many cases human operators will not be able to identify the specific features being used to make decisions about people -- the features could be highly correlated with some subpopulation of protected class. If you take that to be true, the question reduces onto what decision-making roles ML algorithms have that could be discriminatory, and it's hard to argue this is not a massive part of their current and expected roles.
I think this is going to be a long, winding ethical nightmare that is probably just getting started by human-digestible examples such as these. One can imagine things like this one being looked back on as quaint in the naivety to which we assume we can understand these systems. Where do we draw the line, and how much control do we give up to an optimization function? Surely there is a balance -- how do we categorize and made good decisions around this?
As far as I know, a cohesive ethical framework around this is pretty much non-existent -- the current regime is simply "someone speaks up when something absurdly and overtly bad happens."
[+] [-] abdullahkhalids|7 years ago|reply
This is just Simpson's paradox [1] which is notoriously hard to identify because you have to compare the overall with the breakdown. As you say, current-AI probably already has such biases.
[1] https://en.wikipedia.org/wiki/Simpson%27s_paradox
[+] [-] jakelazaroff|7 years ago|reply
This question can be rephrased as "is there a difference between de facto and de jure discrimination?"
My answer is no, causality doesn't matter here: if feature A is a good predictor that some person belongs in group B and not group C, then filtering out feature As is effectively the same as filtering out only group Bs.
[+] [-] Bartweiss|7 years ago|reply
The article notes that Amazon's system rated down grads from two all-women's schools. But it immediately occurs to me to wonder what the algorithm did with candidates from heavily gender-imbalanced schools, which could be much harder to spot.
RPI's Computer Science department is about 85% male, while CMU's is just over 50% male. CMU's CS department is also considered one of the best in the world, and presumably any functional algorithm that cared about alma mater would respond to that. So if the bias ends up being "because of CMU's gender ratio, CMU grads with gender-unclear resumes are advantaged slightly less than otherwise would be", how on earth would someone spot that?
Once you're looking for it, you could potentially retrain with some data set like "RPI resumes, but we adjusted their gendered-words rate" and see if you get a different outcome on your test set. But that's both a labor intensive task, and one that's only approachable once you already know what you're looking for. And even if you do see a change, you'd still have to tease it out from a dozen other hypotheses like "certain schools have more organizations with gendered names, and the algorithm can't tell that those organizations are a proxy for school".
Of course, the counterpoint is that human decisions can't be scrutinized any better, and it's not entirely clear they're less arbitrary or more ethical. At a certain point algorithmic approaches are being scrutinized because they're slightly transparent and testable, so running them on a range of counterfactuals or breaking down their choices is hard rather than impossible. I suspect that's true, but it doesn't really comfort me - humans at least tend to misbehave along certain predictable axes we can try to mitigate, while ML systems can blindside us with all sorts of new and unexpected forms of badness.
[+] [-] darawk|7 years ago|reply
[+] [-] jhfhhhf|7 years ago|reply
[deleted]
[+] [-] strict9|7 years ago|reply
[+] [-] fx32s|7 years ago|reply
Facebook is already auto-flagging content this way but it's just a very hard problem (even for humans).
[+] [-] hahan|7 years ago|reply
[+] [-] patwolf|7 years ago|reply
I see four possibilities here:
1. The algorithm was designed in a completely inept fashion
2. The algorithm design was sound, but ultimately ineffective
3. The algorithm was sound and effective, but results were considered discriminatory.
4. There's something biased about how employees are rated--the data that would feed into the algorithm, which is possibly more of a human element.
Edit: Added fourth possibility
[+] [-] electrograv|7 years ago|reply
I’m not sure what the best alternative should be, though. I am a fan of open source work as a sort of code portfolio, but it doesn’t work for every kind of engineering/science (edit: and also would introduce bias against professionals too busy for open source.)
Regarding bias — it seems the only way to truly eliminate it (including unconscious bias) is author-blind reviews, i.e. reviewing code written by a candidate without knowing anything about that candidate’s identity. (And the nice thing about code is it usually doesn’t signal any identity traits of the author via side channels.)
[+] [-] kentm|7 years ago|reply
[+] [-] fhbdukfrh|7 years ago|reply
[+] [-] stef25|7 years ago|reply
Should I ever be in a position to hire a colleague, I wouldn't ever do so without having a chat with them.
I spend 8hrs a day in an office with my colleagues (sometimes more than with my wife & kid) and the ones I can't stand is about the only thing wrong with my job.
If we can't even see the person's face over some gender bias hysteria then I wonder how the hell we got here.
People should just get over the fact that men and women are different.
[+] [-] arayh|7 years ago|reply
[+] [-] arandr0x|7 years ago|reply
You can de-bias by explicitly controlling for gender, but now everyone in your company went to CMU and likes dogs.
The more I see news about what recruiting for ultra-large corporations, the more I think one of two things is true:
* ultra-large corporations are doomed to hire less and less well in a way that is more and more biaised, and we should regulate against such corporations in a way that forces them to redistribute their wealth to SMBs;
* ultra-large corporations need to start exclusively growing through acquisitions, which will have the effect of redistributing their wealth to SMBs, and also of hiring a more diverse base of employees because there is a priori a greater diversity of backgrounds leading to success in the free market than the diversity of backgrounds leading to success in the Amazon interview.
[+] [-] thoughtexplorer|7 years ago|reply
The best thing to be right now is a woman engineer. You can easily get hired within the week.
Unfortunately this doesn't seem to be well known outside of those involved in hiring.
[+] [-] fizwhiz|7 years ago|reply
[+] [-] InclinedPlane|7 years ago|reply
[+] [-] beat|7 years ago|reply
The system failed because they were trying to solve the wrong problem, or maybe more specifically, didn't solve the problem that led to the problems with the AI. Amazon was treating the hiring problem as an efficiency problem alone, and ignoring the bias problem. So they wound up training the AI to do a shitty job much faster than humans ever could be shitty - and, by analyzing the data in a way the human results weren't analyzed, showed the failings of the human hiring process.
Existing process is sexist. Automate to "improve" it, and you wind up with something even more sexist. What this means is that Amazon needs to go back and revamp their whole hiring process to make it fair, before trying to make it faster.
[+] [-] ncallaway|7 years ago|reply
Could this be a direct indicator of a powerful subconscious bias in Amazon's existing hiring process?
[+] [-] zdragnar|7 years ago|reply
[+] [-] noetic_techy|7 years ago|reply
I don't think this will ever work. There is too much variability in resume wording that correlates to gender and even culture of origin even when you take out names and any other protected class identifying markers. The Dutch tried this and ended up with less diversity.
I'm going to go out on a limb and say you almost want to leave all that identifying data in, but put each candidate into buckets with separate rating algorithms trained against only that "type" of candidate. The top candidates from each culture, and the top candidates from each gender, etc etc, however you want to do it. Feed them into a picking algorithm that builds a composite of what you want your team to look like diversity wise based on the top candidates from each bucket, and go from there.
Don't take my opinion seriously, I'm not an ML guy.
[+] [-] daenz|7 years ago|reply
The project was doomed from the start.
[+] [-] ummonk|7 years ago|reply
[+] [-] leeny|7 years ago|reply
http://blog.alinelerner.com/lessons-from-a-years-worth-of-hi...
That aside, what sucks is that attempts to automate resume scoring rarely look at harder-to-quantify features and focus on low-hanging fruit like keyword occurrences... though in my experience it's such a low-signal document for engineering hiring that the whole thing is a fool's errand.
[+] [-] vishal_pym|7 years ago|reply
[+] [-] trustmath|7 years ago|reply