Ask HN: Any open source code/materials on predicting future crimes based on data?

[+] Asdfbla|8 years ago|reply

In any case, you shouldn't neglect the subtle but important sources of bias those pre-crime models can have. Here's an interesting talk about it:

https://www.youtube.com/watch?v=MfThopD7L1Y

Basically, one instance of bias is the fact that many crime-prediction models are trained on police data, which means they will predict crime in places more often targeted by the police anyway. Then the model predictions even amplify that effect, since more training data may be generated from the places now more often policed, etc.

There's lots of resources out there on AI fairness these days. I think everyone who tries stuff like crime prediction should read up on that topic.

[+] murtali|8 years ago|reply

Cathy O'Neil wrote a book called "Weapons of Math Destruction" -- interesting read.

You can listen to an interview she does on econtalk -- interesting to learn more about the hidden biases.

http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.ht...

[+] mc32|8 years ago|reply

So if the AI identifies insider trading at trading firms, banks, etc., we should beware that this would create a feedback loop to look more into the investment and banking sectors and will ignore the mom and pop insider traders? That they go where crime is rampant over where it isn't and that could be a bad thing?

[+] chronic193|8 years ago|reply

[deleted]

[+] bayesbiol|8 years ago|reply

Any such system is/would be potentially very dangerous. Crime data is not the same thing as crime. Populations that are over-policed are be disproportionately represented in any such data set, leading to higher prediction of crime, leading in turn more over-policing (feedback loop). I implore anyone attempting to build such a system to consider the serious issue of machine bias and it's implications in the real world.

See this tutorial given at this years NIPS machine learning conference: http://mrtz.org/nips17/#/

[+] jensv|8 years ago|reply

Potential dangers of such a film are highlighted in the film Minority Report. https://en.wikipedia.org/wiki/Minority_Report_(film)

[+] unknown|8 years ago|reply

[deleted]

[+] USNetizen|8 years ago|reply

This is an area that was explored some years ago, but ultimately determined to have civil rights pitfalls. Crime reporting is only as good (or biased) as the humans that report and input the crime data. Therefore, crime "training" data for AI systems can be very biased and it might only magnify those biases more so using AI - a sort of self-perpetuating negative feedback loop.

Having worked in law enforcement at various levels (state and federal) in a prior professional life, I can attest to the differences in what gets reported and how based upon who was working or supervising and where they were assigned. Humans are simply not reliable reporters for this kind of data. No matter how hard we try to make the reports plain and standardized our biases, one way or another, will always seep in.

[+] minimaxir|8 years ago|reply

Inspired by a Kaggle competition (https://www.kaggle.com/c/sf-crime), one of my older blog posts involved predicting the type of arrest in San Francisco (given that an arrest occurred) using data such as location and timing and the relatively new LightGBM machine learning algorithm: http://minimaxir.com/2017/02/predicting-arrests/

The code is open-sourced in an R Notebook: http://minimaxir.com/notebooks/predicting-arrests/

The model performance isn't great enough to usher in precrime, even in the best case. There are likely better approaches nowadays. (e.g. since the location data is spatial, a convolutional neural network might work better.)

[+] SamReidHughes|8 years ago|reply

Careful! Your crime predictor might unfairly conclude that men are more likely to commit crimes than women.

[+] dahart|8 years ago|reply

People might conclude the same thing, since more crimes are committed by males than females. Are you saying it's unfair because an innocent male could become a suspect by virtue of only being male? Or for a more subtle reason?

Statistics have been consistent in reporting that men commit more criminal acts than women.[1][2] Self-reported delinquent acts are also higher for men than women across many different actions.[3] Burton, et al. (1998) found that low levels of self control are associated with criminal activity.[4] Many professionals have offered explanations for this sex difference. Some differing explanations include men's evolutionary tendency toward risk and violent behavior, sex differences in activity, social support, and gender inequality.

https://en.wikipedia.org/wiki/Sex_differences_in_crime

[+] jhiska|8 years ago|reply

OP wants an open sourced, data-based statistical model of where crime might occur (methodological flaws and all), and not an unasked-for politicized preaching about the supposed virtues of a subset of people over another subset of people.

[+] otakucode|8 years ago|reply

Or worse, it might actually mention out loud that white collar crime kills more people and costs far more money to society every year than street crime, and mention that white collar crime is completely normalized and not even seen as deviant within upper class communities. What are you supposed to do then? Actually arrest the rich for the harm they do?

[+] michaelmcmillan|8 years ago|reply

Reductio ad absurdum of the bias argument (-:

[+] Torgo|8 years ago|reply

It would be right.

[+] lwansbrough|8 years ago|reply

There are much better ways to solve crime than to double down on enforcement that is already happening, which is likely all your model will tell you. “Police the neighbourhoods where people are poor” wow, thanks ML!

Palantir already does all this on a massive scale for the US govt. Want to affect future crime in a positive way? Solve the problems that contribute to it.

Not that you asked.

[+] michaelmcmillan|8 years ago|reply

I am currently writing my master thesis on predictive policing using machine learning. Working with local police in Norway. Got a bunch of papers and articles you might find interesting. Hit me up: [email protected]

[+] godelski|8 years ago|reply

I'd be interested to know how you filter the garbage data? (By questions here I know others are interested, so public response would be great)

I know people have done these types of studies before but found that they easily became bias, and thus there is a wariness of using it (like the judge AI who was more likely to convict black people). I'm not sure how it is in Norway, but I don't expect it to be much different from America, where there are places which are disproportionately convicted of crimes, where other areas such crimes are seen as infractions. This is really going to mess with the data and perpetuate the bad system.

[+] febin|8 years ago|reply

Thanks, Just shot an email.

[+] 132121321qewdqw|8 years ago|reply

[deleted]

[+] thedrake|8 years ago|reply

A lot of good work by Cynthia Rudin http://online.liebertpub.com/doi/pdf/10.1089/big.2014.0021 and her tools are open sourced (her papers https://users.cs.duke.edu/~cynthia/papers.html and tools https://users.cs.duke.edu/~cynthia/code.html)

[+] WhitneyLand|8 years ago|reply

Do you know about the journalist who spent years obsessing about this and supposedly had some predictive success relating to serial killers?

If I recall it was kind of a lone wolf effort, so I don’t know the rigor of his techniques, howver you never know if he might want to share results or collaborate.

Don’t have a link handy, but that should be enough info to google if you’re interested.

[+] noisecanceling|8 years ago|reply

I think you're referring to this article about Thomas Hargrove that was in Bloomberg in February:

https://www.bloomberg.com/news/features/2017-02-08/serial-ki...

He's the founder of the Murder Accountability Project:

http://www.murderdata.org

[+] jjoonathan|8 years ago|reply

Ask HN: Any open source code/materials on predicting good fall guys based on data?

[+] ryanmaynard|8 years ago|reply

There is a project[1] + whitepaper[2] on projecting the likelihood of future white collar crimes written by Sam Lavigne, Francis Tseng, and Brian Clifton.

[1] https://thenewinquiry.com/white-collar-crime-risk-zones/ [2] https://whitecollar.thenewinquiry.com/static/whitepaper.pdf

[+] partycoder|8 years ago|reply

https://en.wikipedia.org/wiki/Predictive_policing

The British series "The Code" speaks a little bit about it in ep 3: https://en.wikipedia.org/wiki/The_Code_(2011_TV_series)#Stag...

[+] zebrafish|8 years ago|reply

Believe I heard about a project a UW student did predicting crime in San Francisco based on volume of vulgar tweets in a given area. Not sure if it's on github anywhere but you can always start with that idea. Nothing about specifics of the crimes, just where a high volume of them would be located.

[+] tobylane|8 years ago|reply

There's a British tv presenter and scientist called Hannah Fry who has published in this area, including a talk in Germany (received just like many comments on this page), some Numberphile videos and BBC documentaries in other areas of data science.

[+] YurtleTheTurtle|8 years ago|reply

https://www.propublica.org/article/machine-bias-risk-assessm...

Food for thought on how incredibly biased these effort can be.

[+] paulie_a|8 years ago|reply

For a source of data: https://data.cityofchicago.org/

And in the case of crime, chicago should be a pretty good dataset.

[+] crabl|8 years ago|reply

https://github.com/kandluis/crime-prediction is a good place to start

[+] jeffmould|8 years ago|reply

Are you looking for predicting future crimes in an area (i.e. city, neighborhood, state, etc...) or predicting whether an individual will commit future crimes?

[+] PaulHoule|8 years ago|reply

https://en.wikipedia.org/wiki/CompStat

[+] chiefalchemist|8 years ago|reply

Fwiw there's some discussion of this in the book Everybody Lies. Look into that. Perhaps follow up with the author. His name escapes me atm.

56 comments