Well I feel that is a really really broad term to just ask for bias without really defining it but a couple off the top of my head are.
1. Someone from Utah is more likely to be a member of the Church of Jesus Christ of Latter Day Saints than someone from Pennsylvania.
2. Someone from an Arab speaking country is more likely to be Muslim than someone from a non Arab speaking country.
3. Someone who says "eh" at the end of every sentence is more likely to be Canadian.
4. Someone who says y'all is more likely to be from the south.
5. If someone asks me to "Please do the needful" they are likely from India.
I've purposely chosen non extreme examples because there are many basis all over the place. Bais ≠ prejudice.
Ultimately if we artificially restrain AI from being "baised" in any form we are really shooting ourselves and those most disadvantaged in the foot because instead of being able to use AI to discover the basis and then work on fixing it we instead just to pretend it doesn't exist.
Finally a more provocative example. People who get pay day loans are less likely to pay back loans, black people are more likely to use pay day loans, ergo black people are more likely to default on loans. If we try and just force an AI to ignore this then we paper over the problem. If instead we start to examine causality we can start to figure out the root of the issue and how to address.
Indeed. Use of priors do not intrinsically make the system biased. It's a bias only if those priors are incorrect for whatever reasons, or if the facts specifically about the sample under consideration are not able to override the population priors.
I’m not outright disagreeing, but it seems your last statement contradicts the rest of the payday example. “If instead we start to examine causality we can start to figure out the root of the issue and how to address.”
The causality piece is exactly the issue, right? People who use payday loans have less savings, more likely to work in jobs where their hours are unstable, have other poor financial indicators (past use of a payday loan, for example). Black people may disproportionately fall into this category, but I would argue it is wrong to effectively punish all black people (or conversely give other ethnicities an easier time) simply because of their race.
Biases exist, no argument there. The dilemma is what we do with them.
Another way to word it is that an unbiased AI will never be able to perform better than humans at many tasks. Statistically accurate bias isn't a bug, it is a feature. Sometimes you want to avoid it for other reasons, like it feels wrong to assume traits are correlated with race etc, but by default the AI Should always be biased except for a few special cases.
>Well I feel that is a really really broad term to just ask for bias without really defining it but a couple off the top of my head are.
I hope this doesn't come off as overtly pedantic, but of course it depends on how you define it. However, even when we DON'T define it, the examples that are used reflect a definition. Fundamentally, the problem is in the continued assumption that data has inherent meaning, rather than being interpreted by human beings in context.[0]
There are two conflicting definitions of bias at play here...
Definition 1 (I believe this is yours): Bias = a purely statistical phenomenon, a situation where one variable is meaningfully predicted by another one. Synonym for correlation.
Definition 2 (I would argue this is mine, the authors, and the colloquial): Biases refer to specific prejudices that are typically unfair. Note...in this definition bias is a synonym for prejudice. They are not making a data argument for that, they are using the term bias to reflect a prejudiced pattern because they understand AI cannot have a 'prejudice' but that societal prejudice can induce biases in Data.
When people use bias to mean definition 2, they are not inherently saying all definition 1 biases are bad. Your examples, under their definition are not biases, they are correlations. The fact that some people use definition 1 does not mean definition 2 is invalid. The authors definition of Bias starts from the idea that a bias is a prejudice, that there are other terms besides bias to describe your examples. Arguing over the definition is distracting you from the point the authors are trying to make...it doesn't actually serve you understanding the article. I say it that way to separate it from anything about this conversation, because I think you are coming at this in good faith and with a sincere point.
No one is forcing AI to ignore that...instead they are saying AI should be able to put that in context of greater societal patterns to be valid. All math, all innovation, all idea exist within a society when they are implemented. It is how they are used that matters. As a thought experiment to your provocative example...ALL pay day loans have a higher default rate...if all mainstream banks .
To give a different example, lets look at the field of educational test and measurement (my current living). I design a test of math ability. I contextualize each question within a game of Cricket. Who do you expect to do well because they have more experience with Cricket? who do you expect to underperform due to the context of the question? ...What am I actually measuring (hint: not just math)? If the data from such tests were inherently 'free from bias' than it wouldn't matter if I asked demographics questions first or last, when in reality - asking demographics questions before a math test lowers Women's scores. Educational test folks constantly ask: what is being measured. They follow it up with a second important question: How is this test, data, and resulting scores going to be used. What does it mean for a test to be fair?[1] When am I trying to measure math skill and I end up measuring gender and poverty instead? If that happens, what can I do with the data? What does the data really mean?
[0] as a fun footnote...this assumption is actually embedded in language used in research, it is why 'Science' fields have tended to stick to third person language while 'social science' and other related fields have largely flipped to first person language.
hpoe|4 years ago
1. Someone from Utah is more likely to be a member of the Church of Jesus Christ of Latter Day Saints than someone from Pennsylvania.
2. Someone from an Arab speaking country is more likely to be Muslim than someone from a non Arab speaking country.
3. Someone who says "eh" at the end of every sentence is more likely to be Canadian.
4. Someone who says y'all is more likely to be from the south.
5. If someone asks me to "Please do the needful" they are likely from India.
I've purposely chosen non extreme examples because there are many basis all over the place. Bais ≠ prejudice.
Ultimately if we artificially restrain AI from being "baised" in any form we are really shooting ourselves and those most disadvantaged in the foot because instead of being able to use AI to discover the basis and then work on fixing it we instead just to pretend it doesn't exist.
Finally a more provocative example. People who get pay day loans are less likely to pay back loans, black people are more likely to use pay day loans, ergo black people are more likely to default on loans. If we try and just force an AI to ignore this then we paper over the problem. If instead we start to examine causality we can start to figure out the root of the issue and how to address.
alok-g|4 years ago
Indeed. Use of priors do not intrinsically make the system biased. It's a bias only if those priors are incorrect for whatever reasons, or if the facts specifically about the sample under consideration are not able to override the population priors.
samkater|4 years ago
The causality piece is exactly the issue, right? People who use payday loans have less savings, more likely to work in jobs where their hours are unstable, have other poor financial indicators (past use of a payday loan, for example). Black people may disproportionately fall into this category, but I would argue it is wrong to effectively punish all black people (or conversely give other ethnicities an easier time) simply because of their race.
Biases exist, no argument there. The dilemma is what we do with them.
username90|4 years ago
commandlinefan|4 years ago
avs733|4 years ago
I hope this doesn't come off as overtly pedantic, but of course it depends on how you define it. However, even when we DON'T define it, the examples that are used reflect a definition. Fundamentally, the problem is in the continued assumption that data has inherent meaning, rather than being interpreted by human beings in context.[0]
There are two conflicting definitions of bias at play here...
Definition 1 (I believe this is yours): Bias = a purely statistical phenomenon, a situation where one variable is meaningfully predicted by another one. Synonym for correlation.
Definition 2 (I would argue this is mine, the authors, and the colloquial): Biases refer to specific prejudices that are typically unfair. Note...in this definition bias is a synonym for prejudice. They are not making a data argument for that, they are using the term bias to reflect a prejudiced pattern because they understand AI cannot have a 'prejudice' but that societal prejudice can induce biases in Data.
When people use bias to mean definition 2, they are not inherently saying all definition 1 biases are bad. Your examples, under their definition are not biases, they are correlations. The fact that some people use definition 1 does not mean definition 2 is invalid. The authors definition of Bias starts from the idea that a bias is a prejudice, that there are other terms besides bias to describe your examples. Arguing over the definition is distracting you from the point the authors are trying to make...it doesn't actually serve you understanding the article. I say it that way to separate it from anything about this conversation, because I think you are coming at this in good faith and with a sincere point.
No one is forcing AI to ignore that...instead they are saying AI should be able to put that in context of greater societal patterns to be valid. All math, all innovation, all idea exist within a society when they are implemented. It is how they are used that matters. As a thought experiment to your provocative example...ALL pay day loans have a higher default rate...if all mainstream banks .
To give a different example, lets look at the field of educational test and measurement (my current living). I design a test of math ability. I contextualize each question within a game of Cricket. Who do you expect to do well because they have more experience with Cricket? who do you expect to underperform due to the context of the question? ...What am I actually measuring (hint: not just math)? If the data from such tests were inherently 'free from bias' than it wouldn't matter if I asked demographics questions first or last, when in reality - asking demographics questions before a math test lowers Women's scores. Educational test folks constantly ask: what is being measured. They follow it up with a second important question: How is this test, data, and resulting scores going to be used. What does it mean for a test to be fair?[1] When am I trying to measure math skill and I end up measuring gender and poverty instead? If that happens, what can I do with the data? What does the data really mean?
[0] as a fun footnote...this assumption is actually embedded in language used in research, it is why 'Science' fields have tended to stick to third person language while 'social science' and other related fields have largely flipped to first person language.
[1] https://www.ets.org/about/fairness
nmca|4 years ago
https://en.m.wikipedia.org/wiki/Lee_Jussim