The Biggest Danger of AI Isn't Skynet — It's Human Bias That Should Scare You

The Biggest Danger of AI Isn't Skynet — It's Human Bias That Should Scare You

Fears of artificial intelligence run amok make for compelling apocalypse narratives, but the real dangers of artificial intelligence come more from humans than machines.

Science & Tech

Over the last few years, there has been a lot of talk about the threat of artificial general intelligence (AGI). An AGI is essentially an artificial superintelligence. It is a system that is able to understand — or learn — any intellectual task that a human can. Experts from seemingly every sector of society have spoken out about these kinds of AI systems, depicting them as Terminator-style robots that will run amok and cause massive death and destruction.

Elon Musk, the SpaceX and Tesla CEO, has frequently railed against the creation of AGI and cast such superintelligent systems in apocalyptic terms. At SXSW in 2018, he called digital superintelligences "the single biggest existential crisis that we face and the most pressing one," and he said these systems will ultimately be more deadly than a nuclear holocaust. The late Stephen Hawking shared these fears, telling the BBC in 2014 that "the development of full artificial intelligence could spell the end of the human race."

Many computer science experts also agree. Case in point, Stuart Russell, a Computer Science and Smith-Zadeh Professor in Engineering at the University of California, Berkely, appeared in a short film that warned of the danger of slaughterbots — weapons that use artificial intelligence to identify and kill without human intervention. The purpose of the film? To scare people into taking action to ban superintelligent autonomous weapons systems.

Click to continue reading

But few people rail against the ways that AI systems are already harming humans. And yes, AI systems are already causing incredible harm in the world, but not for reasons you might think.

It's not the threat of an all-powerful superintelligence lording over humanity or turning the entire world's resources to making paperclips that we need to worry about the most right now. Such a reality is still a distant concern. It's the human institutions that already use AI that we need to interrogate and examine today, as many use cases are already harming humans in real and dramatic ways.

There's a saying in statistics and data science: garbage in, garbage out.

This is a serious problem when talking about machine learning algorithms. Deep learning is a family of machine learning methods that use neural networks to "teach" a computer algorithm to recognize patterns. This pattern matching is how computers are able to recognize and identify songs after hearing only a couple of seconds of music, detect speech and transcribe the words a speaker is saying, and even generate deep fakes.

All deep learning methods — indeed, all machine learning methods — begin with data.

Whenever you hear a news report of a researcher scraping a website like Facebook for public photos to use for some kind of facial recognition program, the pictures are the data that the machine learning algorithm is being trained on. Fortunately, once the images are fed through the machine learning algorithm, they are typically deleted, since they aren't useful for the data scientists any longer. People often complain about the effects of this on privacy, but to see the problem, we need to go back to that old saying: garbage in, garbage out.

That's not to say your lovely selfie is garbage, necessarily. But what if the majority of selfies that are fed into the algorithm depict predominantly light-skinned, "white" faces? Well, then that algorithm will become very good at detecting those kinds of faces. So, how do you think it would do when tasked to detect and identify darker-skinned, "black and brown" faces? In this respect, we could say that the algorithm has picked up a bias towards identifying lighter-skinned faces.

What about loan applications? If you were to feed every loan application on record into a machine learning algorithm, along with whether or not that application was approved or rejected, then your machine learning algorithm would be very good at accepting the kinds of loan applications that have been previously accepted and rejecting those that have been previously rejected.

But what if the data you fed it consisted largely of 1) rejected loan applications from minority applicants with impeccable credit records and 2) accepted applications from white applicants with less than impeccable credit? If this was the data that was used, then the algorithm would be inadvertently trained to hone in on the race of the applicants, rather than the credit scores, and assume that people from minority backgrounds or with darker skin tones should be rejected, since that seems to be the underlying pattern of the loan approval process. And the algorithm wouldn't be wrong in detecting that. In fact, it would be doing exactly what its human creators trained it to do.

Things like this are already happening.

What about law enforcement? Since the 1990s, police departments the world over have relied on crime statistics to produce a "predictive policing" model for law enforcement, essentially to place police resources in the areas where the data says "most of the crime" takes place. But if most of your police resources are directed to a specific area, perhaps an area where minorities live, then you are also more likely to find crime in that area.

If that data is then fed into the "predictive" algorithm, it is going to find that more crime occurs in that area, so it will send more resources to that area, which will lead to more crime being found. This feedback loop doesn't reflect where crime is actually occurring, it is reflecting where the police are finding crime, which is a subtle but important difference. Since police have traditionally patrolled poor and minority neighborhoods more frequently, data is going to skew towards these areas, which in turn reinforces the policing bias towards these areas.

Again, this is not an imagined, fictional future we are talking about. These biased algorithms already exist, and they are being used in police departments around the world.

Obviously, in the case of policing, the harm caused by a machine learning model is apparent. Drug use across different racial and income demographics is nearly identical, but predictive policing predominantly directs police resources to poor and minority neighborhoods, leading to disproportionate arrests and ruined lives.

Likewise with facial recognition, if law enforcement uses a facial recognition system to identify criminal suspects, and that algorithm is not well-trained in recognizing dark-skinned faces, it is going to produce a larger number of false positives. If a disproportionate number of suspects are misidentified by the facial recognition algorithm, and those misidentifications lead to an arrest — or worse, a conviction — then that algorithm is self-reinforcing and isn't just wrong, but dangerous.

This is even more of an issue because of how we have approached machine learning over the years: we've treated it as if it wasn't biased.

If your loan application was rejected, it wasn't because the loan officer was racist, it's because the algorithm said to reject you, and so you were rejected. If your community is being overpoliced, it's not necessarily because the current cops are racist, it's because the algorithm told police that your neighborhood had a higher rate of crime — it primed the officers to believe that there were more criminals in your neighborhood. If you've been arrested for committing a crime you did not commit, it's not necessarily that police officers or witnesses misidentified you unconscious racial biases, it's because an artificial intelligence matched your face to grainy security camera footage of someone committing a crime. Of course, that unconscious bias may also have made the witnesses and officers more likely to accept that grainy footage and AI-matching at face value.

In every one of these instances, you have replaced the human being with a machine learning algorithm, yet somehow the same systemic pattern of discrimination and persecution of poor and minority populations that have been documented for decades magically reproduce themselves in artificial intelligence. However, because we treat artificial intelligence as if it does not have human biases, we take its word for it, leading to the same systemic biases we were "trying" to avoid.

Can Better Data Help?

Is the problem a matter of using better data, or are we simply trying to apply a bandaid to a gaping social wound and hope that the problem fixes itself?

Certainly, accounting for biases in machine learning training data will produce better models, but they won't be perfect. We'll never be able to fully neutralize the models, and there's every reason to ask whether that should be the goal at all. Rather than simply trying to make unbiased AI systems, which may be an impossible task, perhaps we need to question the underlying things we are trying to do and whether they are truly necessary and assisted by AI.

Rashida Richardson, a lawyer and researcher who studies algorithmic bias at Rutgers Law School in New Jersey, asserts that the solution is clear: Rather than trying to paper over this history of abuse with a figleaf of improving "impartial" machine learning, our efforts would be better directed towards the root problems that artificial intelligence is trying to ameliorate. In other words, we need to focus on fixing the current problems in our social systems. Then we can focus on creating viable AI tools.

Perhaps, someday in the distant future, we will need to start worrying about Terminator-style AGI. But for now, the fearmongering isn't helping and only distracts from the conversations we should be having about the actual harm caused by AI.