Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.
There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.
So if the AI identifies insider trading at trading firms, banks, etc., we should beware that this would create a feedback loop to look more into the investment and banking sectors and will ignore the mom and pop insider traders? That they go where crime is rampant over where it isn't and that could be a bad thing?
Any such system is/would be potentially very dangerous. Crime data is not the same thing as crime. Populations that are over-policed are be disproportionately represented in any such data set, leading to higher prediction of crime, leading in turn more over-policing (feedback loop). I implore anyone attempting to build such a system to consider the serious issue of machine bias and it's implications in the real world.
This is an area that was explored some years ago, but ultimately determined to have civil rights pitfalls. Crime reporting is only as good (or biased) as the humans that report and input the crime data. Therefore, crime "training" data for AI systems can be very biased and it might only magnify those biases more so using AI - a sort of self-perpetuating negative feedback loop.
Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.
Inspired by a Kaggle competition (https://www.kaggle.com/c/sf-crime), one of my older blog posts involved predicting the type of arrest in San Francisco (given that an arrest occurred) using data such as location and timing and the relatively new LightGBM machine learning algorithm: http://minimaxir.com/2017/02/predicting-arrests/
The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)
People might conclude the same thing, since more crimes are committed by males than females. Are you saying it's unfair because an innocent male could become a suspect by virtue of only being male? Or for a more subtle reason?
Statistics have been consistent in reporting that men commit more criminal acts than women.[1][2] Self-reported delinquent acts are also higher for men than women across many different actions.[3] Burton, et al. (1998) found that low levels of self control are associated with criminal activity.[4] Many professionals have offered explanations for this sex difference. Some differing explanations include men's evolutionary tendency toward risk and violent behavior, sex differences in activity, social support, and gender inequality.
OP wants an open sourced, data-based statistical model of where crime might occur (methodological flaws and all), and not an unasked-for politicized preaching about the supposed virtues of a subset of people over another subset of people.
Or worse, it might actually mention out loud that white collar crime kills more people and costs far more money to society every year than street crime, and mention that white collar crime is completely normalized and not even seen as deviant within upper class communities. What are you supposed to do then? Actually arrest the rich for the harm they do?
There are much better ways to solve crime than to double down on enforcement that is already happening, which is likely all your model will tell you. “Police the neighbourhoods where people are poor” wow, thanks ML!
Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.
I am currently writing my master thesis on predictive policing using machine learning. Working with local police in Norway. Got a bunch of papers and articles you might find interesting. Hit me up: [email protected]
I'd be interested to know how you filter the garbage data? (By questions here I know others are interested, so public response would be great)
I know people have done these types of studies before but found that they easily became bias, and thus there is a wariness of using it (like the judge AI who was more likely to convict black people). I'm not sure how it is in Norway, but I don't expect it to be much different from America, where there are places which are disproportionately convicted of crimes, where other areas such crimes are seen as infractions. This is really going to mess with the data and perpetuate the bad system.
Do you know about the journalist who spent years obsessing about this and supposedly had some predictive success relating to serial killers?
If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.
Don’t have a link handy, but that should be enough info to google if you’re interested.
There is a project[1] + whitepaper[2] on projecting the likelihood of future white collar crimes written by Sam Lavigne, Francis Tseng, and Brian Clifton.
Believe I heard about a project a UW student did predicting crime in San Francisco based on volume of vulgar tweets in a given area. Not sure if it's on github anywhere but you can always start with that idea. Nothing about specifics of the crimes, just where a high volume of them would be located.
There's a British tv presenter and scientist called Hannah Fry who has published in this area, including a talk in Germany (received just like many comments on this page), some Numberphile videos and BBC documentaries in other areas of data science.
Are you looking for predicting future crimes in an area (i.e. city, neighborhood, state, etc...) or predicting whether an individual will commit future crimes?
[+] [-] Asdfbla|8 years ago|reply
https://www.youtube.com/watch?v=MfThopD7L1Y
Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.
There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.
[+] [-] murtali|8 years ago|reply
You can listen to an interview she does on econtalk -- interesting to learn more about the hidden biases.
http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.ht...
[+] [-] mc32|8 years ago|reply
[+] [-] chronic193|8 years ago|reply
[deleted]
[+] [-] bayesbiol|8 years ago|reply
See this tutorial given at this years NIPS machine learning conference: http://mrtz.org/nips17/#/
[+] [-] jensv|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] USNetizen|8 years ago|reply
Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.
[+] [-] minimaxir|8 years ago|reply
The code is open-sourced in an R Notebook: http://minimaxir.com/notebooks/predicting-arrests/
The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)
[+] [-] SamReidHughes|8 years ago|reply
[+] [-] dahart|8 years ago|reply
Statistics have been consistent in reporting that men commit more criminal acts than women.[1][2] Self-reported delinquent acts are also higher for men than women across many different actions.[3] Burton, et al. (1998) found that low levels of self control are associated with criminal activity.[4] Many professionals have offered explanations for this sex difference. Some differing explanations include men's evolutionary tendency toward risk and violent behavior, sex differences in activity, social support, and gender inequality.
https://en.wikipedia.org/wiki/Sex_differences_in_crime
[+] [-] jhiska|8 years ago|reply
[+] [-] otakucode|8 years ago|reply
[+] [-] michaelmcmillan|8 years ago|reply
[+] [-] Torgo|8 years ago|reply
[+] [-] lwansbrough|8 years ago|reply
Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.
Not that you asked.
[+] [-] michaelmcmillan|8 years ago|reply
[+] [-] godelski|8 years ago|reply
I know people have done these types of studies before but found that they easily became bias, and thus there is a wariness of using it (like the judge AI who was more likely to convict black people). I'm not sure how it is in Norway, but I don't expect it to be much different from America, where there are places which are disproportionately convicted of crimes, where other areas such crimes are seen as infractions. This is really going to mess with the data and perpetuate the bad system.
[+] [-] febin|8 years ago|reply
[+] [-] 132121321qewdqw|8 years ago|reply
[deleted]
[+] [-] thedrake|8 years ago|reply
[+] [-] WhitneyLand|8 years ago|reply
If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.
Don’t have a link handy, but that should be enough info to google if you’re interested.
[+] [-] noisecanceling|8 years ago|reply
https://www.bloomberg.com/news/features/2017-02-08/serial-ki...
He's the founder of the Murder Accountability Project:
http://www.murderdata.org
[+] [-] jjoonathan|8 years ago|reply
[+] [-] ryanmaynard|8 years ago|reply
[1] https://thenewinquiry.com/white-collar-crime-risk-zones/ [2] https://whitecollar.thenewinquiry.com/static/whitepaper.pdf
[+] [-] partycoder|8 years ago|reply
The British series "The Code" speaks a little bit about it in ep 3: https://en.wikipedia.org/wiki/The_Code_(2011_TV_series)#Stag...
[+] [-] zebrafish|8 years ago|reply
[+] [-] tobylane|8 years ago|reply
[+] [-] YurtleTheTurtle|8 years ago|reply
Food for thought on how incredibly biased these effort can be.
[+] [-] paulie_a|8 years ago|reply
And in the case of crime, chicago should be a pretty good dataset.
[+] [-] crabl|8 years ago|reply
[+] [-] jeffmould|8 years ago|reply
[+] [-] PaulHoule|8 years ago|reply
[+] [-] chiefalchemist|8 years ago|reply