Udesh Habaraduwa
3 min readAug 20, 2019

Despite what the internet echo chamber will try to tell you, there are no ‘racist facts’ or ‘racist data’ unless the implication is that the data has been manipulated in such a way as to over represent a specific population. If the algorithms are being trained with race, sex, religion etc in the training data, then there is a claim to be made.

However, if an algorithm is provided data with no demographic information and it classifies a population of the samples as higher risk anyway, it’s doing so not based on a demographic — it never had this information. It’s doing so based on the information it has been given. We are assigning race to that subset after the fact as an observation.

If we wanted to predict the probability that a person would pay back a loan with no information about their race, sex, religion, etc taken into account and we still get an output that assigns them a low probability AND the person happens to be a ‘protected class’, information which the algorithm was not trained on, you cannot say that the algorithm was biased against the protected class. That is simply not the same thing. If the model in question and others like it consistently provide output that appears to be biased against a ‘protected class’ , which it had no idea about to begin with, that is seriously unfortunate but not a consequence of the algorithm discriminating against a protected class.

If we have training data , in the housing loan example , of a group of people that have reliably paid back their loans, controlling for race, sex, religion etc — meaning the affect of these variables have been neutralized — then this concern should be alleviated. However, one might make the claim that a certain demographic majority may be over represented in the collection of data itself — More of demographic A (which is, again, information that is not collected) may make up more of the dataset than the demographic B (again, information that is not collected). There are ways to control for this. For example, we can collect equal amounts of data from each demographic that fit the class of ‘pays back loans in time’. After doing all of this, the goal, remember, is to predict the probability of payback of a new applicant. The new applicant information, sans race, sex , religion etc, is provided to the model and a result is returned. If the model STILL predicts low probability AND the applicant is a member of the ‘protected class’, how can you make the claim that the model is biased? Furthermore, this case can be further strengthened by testing the model against a dataset of an ‘unprotected class’ of the population using the same features as provided by our new applicant.

If the model consistently classifies individuals in this new dataset, that share similar attributes and values for each of the features as our hypothetical applicant, and the result is the same, how can we claim bias?
We absolutely cannot and should not discriminate against anyone based on their race, sex, gender, religion etc. We cannot have that — end of story. However, saying that a person is exempt from the realities of analysis that never took into account race, sex, gender, religion etc in the first place simply because they happen to be a ‘protected class’ is actually the definition of being fill-in-the-blank-ist.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Udesh Habaraduwa
Udesh Habaraduwa

Written by Udesh Habaraduwa

There is no enduring good. Except, perhaps, the enduring search for it.

No responses yet

Write a response