top | item 15979039

Ask HN: Any open source code/materials on predicting future crimes based on data?

62 points| febin | 8 years ago

56 comments

order
[+] Asdfbla|8 years ago|reply
In any case, you shouldn't neglect the subtle but important sources of bias those pre-crime models can have. Here's an interesting talk about it:

https://www.youtube.com/watch?v=MfThopD7L1Y

Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.

There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.

[+] mc32|8 years ago|reply
So if the AI identifies insider trading at trading firms, banks, etc., we should beware that this would create a feedback loop to look more into the investment and banking sectors and will ignore the mom and pop insider traders? That they go where crime is rampant over where it isn't and that could be a bad thing?
[+] bayesbiol|8 years ago|reply
Any such system is/would be potentially very dangerous. Crime data is not the same thing as crime. Populations that are over-policed are be disproportionately represented in any such data set, leading to higher prediction of crime, leading in turn more over-policing (feedback loop). I implore anyone attempting to build such a system to consider the serious issue of machine bias and it's implications in the real world.

See this tutorial given at this years NIPS machine learning conference: http://mrtz.org/nips17/#/

[+] USNetizen|8 years ago|reply
This is an area that was explored some years ago, but ultimately determined to have civil rights pitfalls. Crime reporting is only as good (or biased) as the humans that report and input the crime data. Therefore, crime "training" data for AI systems can be very biased and it might only magnify those biases more so using AI - a sort of self-perpetuating negative feedback loop.

Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.

[+] minimaxir|8 years ago|reply
Inspired by a Kaggle competition (https://www.kaggle.com/c/sf-crime), one of my older blog posts involved predicting the type of arrest in San Francisco (given that an arrest occurred) using data such as location and timing and the relatively new LightGBM machine learning algorithm: http://minimaxir.com/2017/02/predicting-arrests/

The code is open-sourced in an R Notebook: http://minimaxir.com/notebooks/predicting-arrests/

The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)

[+] SamReidHughes|8 years ago|reply
Careful! Your crime predictor might unfairly conclude that men are more likely to commit crimes than women.
[+] dahart|8 years ago|reply
People might conclude the same thing, since more crimes are committed by males than females. Are you saying it's unfair because an innocent male could become a suspect by virtue of only being male? Or for a more subtle reason?

Statistics have been consistent in reporting that men commit more criminal acts than women.[1][2] Self-reported delinquent acts are also higher for men than women across many different actions.[3] Burton, et al. (1998) found that low levels of self control are associated with criminal activity.[4] Many professionals have offered explanations for this sex difference. Some differing explanations include men's evolutionary tendency toward risk and violent behavior, sex differences in activity, social support, and gender inequality.

https://en.wikipedia.org/wiki/Sex_differences_in_crime

[+] jhiska|8 years ago|reply
OP wants an open sourced, data-based statistical model of where crime might occur (methodological flaws and all), and not an unasked-for politicized preaching about the supposed virtues of a subset of people over another subset of people.
[+] otakucode|8 years ago|reply
Or worse, it might actually mention out loud that white collar crime kills more people and costs far more money to society every year than street crime, and mention that white collar crime is completely normalized and not even seen as deviant within upper class communities. What are you supposed to do then? Actually arrest the rich for the harm they do?
[+] Torgo|8 years ago|reply
It would be right.
[+] lwansbrough|8 years ago|reply
There are much better ways to solve crime than to double down on enforcement that is already happening, which is likely all your model will tell you. “Police the neighbourhoods where people are poor” wow, thanks ML!

Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.

Not that you asked.

[+] michaelmcmillan|8 years ago|reply
I am currently writing my master thesis on predictive policing using machine learning. Working with local police in Norway. Got a bunch of papers and articles you might find interesting. Hit me up: [email protected]
[+] godelski|8 years ago|reply
I'd be interested to know how you filter the garbage data? (By questions here I know others are interested, so public response would be great)

I know people have done these types of studies before but found that they easily became bias, and thus there is a wariness of using it (like the judge AI who was more likely to convict black people). I'm not sure how it is in Norway, but I don't expect it to be much different from America, where there are places which are disproportionately convicted of crimes, where other areas such crimes are seen as infractions. This is really going to mess with the data and perpetuate the bad system.

[+] febin|8 years ago|reply
Thanks, Just shot an email.
[+] WhitneyLand|8 years ago|reply
Do you know about the journalist who spent years obsessing about this and supposedly had some predictive success relating to serial killers?

If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.

Don’t have a link handy, but that should be enough info to google if you’re interested.

[+] jjoonathan|8 years ago|reply
Ask HN: Any open source code/materials on predicting good fall guys based on data?
[+] zebrafish|8 years ago|reply
Believe I heard about a project a UW student did predicting crime in San Francisco based on volume of vulgar tweets in a given area. Not sure if it's on github anywhere but you can always start with that idea. Nothing about specifics of the crimes, just where a high volume of them would be located.
[+] tobylane|8 years ago|reply
There's a British tv presenter and scientist called Hannah Fry who has published in this area, including a talk in Germany (received just like many comments on this page), some Numberphile videos and BBC documentaries in other areas of data science.
[+] jeffmould|8 years ago|reply
Are you looking for predicting future crimes in an area (i.e. city, neighborhood, state, etc...) or predicting whether an individual will commit future crimes?
[+] chiefalchemist|8 years ago|reply
Fwiw there's some discussion of this in the book Everybody Lies. Look into that. Perhaps follow up with the author. His name escapes me atm.