BIAS IN MACHINE LEARNING / AI

(Stereotyped By Online Dating Apps)

MC
5 min readApr 24, 2021

Over the last few years, there has been a huge increase in the application of data to systems or to software. Buzzwords like AI (Artificial Intelligence), Machine Learning (ML), Deep Learning, Neural nets, etc. keep coming up. However, few people stop to think about the impact of these data manipulation approaches on society and / or some of the ethical considerations that should be put into place to guide these approaches.

Edited Picture Created by Alex Iby from Unsplash

In a previous article about Anonymizing data, I gave the example of identifying the pink mustang in a dataset that has only one pink mustang. This article presents yet another case where careful consideration is needed when handling data, but from what may be a different angle: the propagation of bias by algorithms. This is most common when biased data is used to create an algorithm. Conversely, it can also occur because of lack of data. Most algorithms or analytical methods used today do not perform well when they come across data that is new to them. For instance, if you use demographic data that is made up of 75% Caucasians and 20% Asians and 5% Mexicans, any analysis on this data alone would not generalize well to other populations. It would most likely give poor predictions when it encounters people of mixed origins, Immigrants or African Americans. In my case, dating apps persistently made strict assumptions about my dating preferences based on stereotypes.

Picture By Kevin Li from Unsplash

How did this happen?

A few months ago, I was hanging out with a friend and we jokingly spoke about how dating has changed over time. I asked him how he was managing that with Covid and this let us to start talking about online dating. It had been a while since I had considered online dating so I decided to give it a try. The first site I registered on seemed somewhat interesting based on the questions that were asked. So I kept on.

I completed the basic profile information, loaded up pictures and provided a summary of myself and what I was looking for in a partner. I was excited to move on to the next part….matches. After going through quite a few of these, I noticed that about 99.5% of the matches I got were of the same race as I was. I then went back to my preferences and excluded matches from my race. The selection of matches I got remained mostly the same. After a few days of noticing same results and making sure it was not because of my location, I got especially curious. I now wanted to know if this would happen in different online dating platforms. So I decided to try other sites to see what happens.

I tried three other dating sites (one which was a veteran in the online dating business, the other a new online dating service created by a renown company and the third an interracial dating app). The experience was largely the same, even on the interracial app. This was surprising, as I had chosen this app specifically because they said they catered to women of my race looking for partners in other races. Had to wonder if they meant ‘intra’ and not ‘inter’ 😃

However, these experiences got me thinking about what the root cause might be. Barring human error, the only other thing that seemed to make sense was that the algorithms seem to be reinforcing the stereotype that people of the same race would date the same race. However, what if we took race out of the picture and made an algorithm learn about compatibility based on other attributes of the individuals. Given the need for diversity and inclusion, it seemed to me that systems like these that reinforce wide swiping stereotypes should be revamped. I wondered what would happen if more people got exposed to compatible matches outside their race. People might not know it is an option because it is not a choice that has been presented to them. You might not know you like Jello until you try it (openness).

Reinforcing stereotypes through algorithms is not limited to one industry. Consider the case of the mortgage industry in the 1930s when redlining was overtly practiced and encouraged by regulations like HOLC (see here). Redlining is when an area is identified as high risk for mortgage loans because of the racial make up of the area. Although this practice has long been declared illegal, the effects are still widely felt. Now a data scientist might use historical data to figure out the rates to offer clients. Because of practices such as redlining, the rates for African Americans and foreigners in certain areas of the nation would have been higher in the dataset. An algorithm trained on this data picks up this pattern and inadvertently reinforces this by assigning higher rates when it sees someone with the characteristic markers of a specific race or location.

Diversity

While these paragraphs focused on dating and mortgage data, diversity, inclusion and openness are important in every industry. Data can inherently reinforce biases (and/ or stereotypes). Hence much thought should be put into generalization and application of the algorithms used to drive these industries. Regulations like GDPR are a step in the right direction, but they are mostly focused on data and privacy. Currently, not much oversight exists on the reach and use of analytical or ML algorithms. The EU recently revealed a few regulations to this effect but I hope more people and governments would take up the charge. Scientists and Policy makers should constantly stop and ask if it is ethical and correct to apply a given algorithm to any given situation at hand. Who might get left out or treated unfairly, and what can we do about it?

While most of these experiences are anecdotal, I expect I am not the only one who has had such an experience. Feel free to share any interesting dating experience you have had.

References:

Anonymizing data:https://towardsdatascience.com/anonymizing-data-sets-c4602e581a35

Redlining maps: https://dsl.richmond.edu/panorama/redlining/#loc=5/39.283/-82.551

Wilkinson, Mike (2017) Michigan’s segregated past — and present (Told in 9 interactive maps) https://www.bridgemi.com/michigan-government/michigans-segregated-past-and-present-told-9-interactive-maps

EU Regulating AI: https://www.nytimes.com/2021/04/16/business/artificial-intelligence-regulation.html

--

--

MC

Data professional with extensive experience leading data science and data engineer product teams. Experienced in several technologies including big data.