What data should AI be trained on to avoid bias?
Humans are introducing their own biases and prejudices into machine learning. As advanced as AI can be, having been built by humans, it can still share some of our own ethical shortcomings. The usage of proper databases during training is one of the ways to help prevent biases from developing within artificial intelligence.
As AI and machine learning permeate every sphere of our lives today, it gets easier to celebrate these technologies. From entertainment to customer support to law enforcement, they provide humans with considerable help. Certain things they are capable of are so amazing that they seem almost like magic to an outside observer.
However, it’s necessary to remember that as astonishing as machine learning-powered tech advancements are, they are still a product created by us, humans. And we can’t simply shed our personalities when developing anything, much less an AI – an algorithm that has to think on its own. While developers’ personal experiences and beliefs are an indispensable asset in creating ML algorithms, alas, they come at a cost sometimes.
SEE ALSO: The Limitations of Machine Learning
A brief overview of bias in AI
No AI, sadly, can stay 100% impartial to everything. There always are and will be biases in it just like in any product made by a human – especially as sophisticated as machine learning algorithms.
Over the last few years, we have seen quite human prejudice exhibited by artificial intelligence more than once.
In cases where AI is used by the police, it can lead to very dire consequences. A 2019 study performed by the UK’s Royal United Services Institute for Defense and Security Studies (RUSI) paints a grim picture. It is concerned with biases machine learning has but this time, it is the machine learning used for data analysis by the police.
As those algorithms are trained on the databases made by the police, they are bound to share the police force’s biases. As the paper quotes a police officer, “young black men” are stopped by the police more frequently than Caucasian men from the same age group. The AI training on reports that represent such a situation will also see the black population as more likely to commit crimes and analyze data accordingly, thus carrying the oppression on.
Racial and gender biases are not the only ones that plague AI. The same machine learning algorithms used by the police forces of England and Wales create another situation when overreliance on them can lead to devastating consequences. An example that the RUSI paper gives is AI assigning risk categories to individuals that have had problems with the law: someone whose likelihood of returning to the life of crime is determined as “low” may still require additional help and guidance not to make another slip. Machine learning algorithms do not fully understand that and by labeling such an individual as a “low-risk” one, gives the police a false sense of safety.
Similarly, AI biases are dangerous in cyberbullying prevention. In this sphere of data analysis, context plays a huge part. The same neutral terms and phrases are often used by the hate groups and support communities very often.
Another example of that is facial recognition algorithms. As it turns out, these algorithms have a harder time distinguishing faces of African American people than those of Caucasians.
Interestingly enough, the recognition AI in question wasn’t developed by amateurs and neither was it a single case: programs developed and sold by Amazon, Microsoft, and IBM all showed signs of racial bias according to the research conducted last year.
The conclusion that research arrived at is that such a state of affairs is caused by the overwhelming majority of employees of the mentioned companies being Caucasian. However, this alone shouldn’t make AI unable to recognize faces of people of other ethnicities. A much bigger problem is the data that AI is trained on to do its job.
The necessity of proper databases in AI training
Even in a hypothetical situation where just one person develops artificial intelligence algorithms, the problem with bias can be avoided by having sufficient databases to train the algorithms on. As long as every ethnicity and race is properly represented in the database, there should be no issue.
However, what proportional representation is fair? After all, African Americans account only for 12.7% of the US population. It’s obvious, though, that if they are represented in an AI training database in that exact proportion, the facial recognition algorithm is going to be less precise for the black population.
Therefore, paradoxically, to ensure that AI doesn’t discriminate against minorities, they need to be overrepresented in the databases.
The situation with police databases is harder because it’s directly based on officers’ behavior which can be skewed against a particular minority or another subset of the population. It appears that strict control over what information acquired by the police gets to be on the machine-training database is necessary.
However, the criteria of how it should be implemented are almost impossible to determine because those who will develop and apply them are also bound to be subject to their own biases. So the best solution for AI bias in law enforcement is, ironically, not to use AI in that sphere due to the potential problems it causes for the police force and the population.