In August of 2013, Robert McDaniel of Chicago, Illinois, was visited by the police. Although McDaniel lived in an area well known for violence, he had never committed any crimes, nor had he recently talked to the police. The reason for the visit? McDaniel had been placed on a computer-generated “heat list” for people “most likely to commit a violent crime” . His only real “crime” was that he was a 22-year-old black high school dropout from a poor neighborhood. While a computer algorithm may simply be the sum of what others put into it, it raises the question: is it ethical to use predictive policing analytics when the data may introduce racial biases? When we look at the consequences, it becomes clear that tools used in predictive policing need to follow stricter guidelines, and security measures need to be implemented against biased data if we wish to use it as an ethical means of crime prevention.
The use of software and statistics in order to track crime has been around for over 20 years. The New York Police Department pioneered the use of data in order to track crime; they developed a software known as CompStat in order to “map crime statistics along with other indicators of problems, such as locations of crime victims and gun arrests” . This software would eventually become the standard software used for tracking a police department’s performance and attacking spikes in crime rates at specific locations where crime was likely to occur. The important distinction between CompStat and today’s predictive policing techniques has to do with the volume of data available as well as the ease with which data scientists can draw meaningful conclusions . Before, CompStat would simply track and analyze performance of a police force by tracking crime in certain areas. Now, however, machine learning and refined algorithms allow police to track both individuals and areas with greater accuracy in order to predict when, where, and by whom a crime may be committed. Given a “training” data set of thousands of crime reports, the model classifies crime based on patterns discovered in the data and can create an algorithm for predicting where a crime will occur. In order to predict who will commit a crime, each individual’s social contacts, from their neighbors to their cell phone history, and even their Facebook friends, are analyzed. Those with numerous contacts who have committed violent crimes are considered more likely to be involved in a violent crime. These processes result in increased monitoring of what police departments call “hot spot” areas that are susceptible to crime.
These newer methods of predictive policing were created with the dual intent of reducing crime and making police officers’ jobs easier. Despite the lack of malicious intent, this use of training data to find patterns has garnered results that many deem racially biased. A study on Chicago’s Strategic Subject List program, which was designed to predict people most likely to be victims of homicides, found that the individuals singled out by the list as potential homicide victims had the same likelihood of becoming a homicide victim as a random comparison group. Despite this, the same group was more likely to be arrested for a shooting . This was interpreted to mean that police were using the list as a means of following leads in various shooting cases, and in turn, they were arresting more people from the homicide-victim watch list. In fact, Jessica Saunders, a criminologist for RAND, a nonprofit research organization that helps to create predictive policing software, claims that “race does impact arrest… It’s a very valid concern” . Saunders goes on to say that “attributes related to race are a part of the model” and play a role in determining whether or not a person is likely to commit a crime . This is concerning for a variety of reasons.
In order to become an ethical data scientist, according to the Modeler’s Hipporatic Oath by quant Emanuel Derman, one should aim to “improve the world, not repeat it” . If data fed into machine learning computers uses indicators associated with race, the source of the inherent bias in the resulting algorithm is not the computer. Rather, it comes from the data collection and input process, a process under the control of data scientists. If the data scientist suspects that the resulting algorithm contains a racial bias, it is his or her duty to “inform the client and to advise them against using the algorithm” . In the instance of predictive policing, the known concerns with racial bias should caution both police and data scientists against using the technology until a method can be developed to predict crime that does not use data associated with race. Failing to do so could perpetuate discrimination against racial minorities and damage public opinion of law enforcement. Further, it could also inhibit the critical thinking of police officers, who could become overly dependent on this type of software. The potential consequences of predictive policing show the necessity for reform, as well as the necessity for increased precautions in both the modelling and data collection stages.
However, even with the necessary precautions, predictive policing not only has more room for error than typical forms of predictive analytics, it also generates a high probability of a feedback loop in which the monitoring of already-monitored areas is heightened even further. William Isaac, an analyst with the Human Rights Data Analysis Group and a Ph.D. candidate at Michigan State University in East Lansing, states, “They’re not predicting the future. What they’re actually predicting is where the next recorded police observations are going to occur” . Herein lies the key problem with using predictive analytics to anticipate crime: there are flaws in the data being provided. A study conducted by the Human Rights Data Analysis Group found that the algorithm used to target drug offenses in Oakland, California, targeted areas with higher concentrations of Latinos and Blacks, despite evidence that shows that drug usage is more evenly dispersed across all of Oakland’s neighborhoods . This happened because police targeted areas where minorities reside, resulting in data that skewed results and created a biased algorithm that targeted the areas where police already tended to make a greater number of drug arrests. This system feeds into itself. Targeting minority-filled areas leads to increased arrests within that area, which in turn leads to an increase in the targeting of that same area. This is what happened in Oakland, and it is extremely likely that it will happen again in other large cities across the country.
In order to reform the usage of racially biased data, we must create guidelines for all who use these tools and exercise transparency to the public so that they can keep police power in check. To begin, predictive policing technology should randomize the areas that police officers visit in order to gain a more complete picture of areas that are truly “high risk”. While some software, including HunchLab, already does this, it needs to become standard in all predictive policing software to avoid “over-policing areas that appear high risk because of biased crime data” . Randomizing would help police not only provide a more balanced crime map of the area, but it would keep police active in all parts of the community. An improved map of arrests would increase the accuracy of predictive policing methods. Additionally, predictive policing programs also need to start tracking the effect of race in areas that are targeted by this software, and these results need to be shared with the public. This can be done by tracking the number of arrests in each neighborhood, along with the disparities between communities of different races . If an outside source, such as an ethical review board, finds that the data collected is overly biased, they should be able to make appropriate changes to the system’s algorithm and the police department’s racial profiling policy. Despite these precautions, it might be impossible to have truly non-racially biased data because race, neighborhood, and income are closely intertwined. However, through the creation of greater checks and a more coherent crime map, we can know with a larger degree of certainty what kind of impact using this software has on the entire surveyed community.
Additionally, the necessity for transparency within the public sphere is paramount, as public debate would help facilitate decisions on which criteria and the types of algorithms police should use. Currently, police departments have refused to disclose the specifics of the programs and algorithms that they use, in favor of “protecting” officers in the field who may be using the technology . This lack of transparency leads to a lack of confidence in the predictive systems. Public confidence in this technology is critical because it allows people to feel unthreatened by the technology and by the police. Failing to open up to the public is an ethical failure that not only keeps the public in the dark, but also impedes the progress of this type of technology. If the floor were open to public debate, society could ensure that the criteria used to judge the risk level of an individual is in no way discriminatory. Academicians could write ethics papers similar to this that would look into the way that the software uses data and determine if the public has any need to be concerned. Additionally, journalists could investigate the true reach that predictive policing has on a community by writing articles to inform people how communities are being affected. Many of the articles written online are purely speculative, but if the technology were open to viewing, we could truly know if there is any need to question its use.
As the technology becomes more universal, knowledge of how to deal with the ethical implications of the technology is paramount to its success in reducing crime ethically. If studies continue to find inherent racial biases in the algorithms, then we will continue to have unfairness in identification and treatment of crime suspects and public suspicion of policing techniques will continue to grow. Being transparent of the type of data being used allows suspicion to die down and public opinion to rise. The ethical use of data not only prevents racially biased data from turning into racially biased algorithms, it also allows the policing community to enter the era of Big Data with open-mindedness. With the increasing applications of tools used to predict crime, the necessity of ethics becomes a priority. Guidelines for how we treat our communities are essential if we wish to overcome racial biases. While malicious intent may not be present, data scientists must be cautious in their use of data and have a firm understanding of how they are affecting their communities. If we wish to live in a world free from discrimination, we must continue to strive toward ethical data collection and transparent algorithms within the policing community. Without both, we will continue down the road of racially biased policing.
By Jeremy Panelli, Viterbi School of Engineering, University of Southern California
 M. Stroud, “The Minority Report: Chicago’s New Police Computer Predicts Crimes, But Is It Racist?”. The Verge. N.p., 2017. Web. 7 Mar. 2017
 “Predictive Policing: What Is It, How It Works, And Its Legal Implications”. The Centre for Internet and Society. N.p., 2017. Web. 7 Mar. 2017.
 M. Hvistendahl, “Can ‘Predictive Policing’ Prevent Crime Before It Happens?”. Science | AAAS. N.p., 2017. Web. 7 Mar. 2017.
 D. Baer, “Predictive Policing Isn’T An Exact Science, And That’S The Problem”. Science of Us. N.p., 2017. Web. 7 Mar. 2017.
 C. O’Neil, “How To Bring Better Ethics To Data Science”. Slate Magazine. N.p., 2017. Web. 7 Mar. 2017.
 “Code Of Conduct”. Datascienceassn.org. N.p., 2017. Web. 7 Mar. 2017.
 D. Morris, “Predictive Policing Did Nothing To Prevent Violence In Chicago”. Fortune.com. N.p., 2017. Web. 5 Mar. 2017.
 A. Shapiro, “Reform Predictive Policing”. nature.com. N.p., 2017. Print.