How do people and algorithms engage in discrimination? How would we know if they did? And in the face of evidence, would people actually call for accountability?

I’m here in New Orleans at the annual meeting of the Society for Personality and Social Psychology, where I gave a talk on high-volume experimentation at the political psychology pre-conference. Today I liveblogged series of talks by Sendhil Mullainathan, Moritz Hardt, and Jennifer Richeson, leading scholars on human and machine bias and discrimination. I’m reporting their talks in sequence.

Up first is Sendhil Mullainathan, a professor of computation and behavioral science at the University of Chicago Booth School of Business.

Sendhil tells the story of an audit study he did with Marianne Bertrand of labor market discrimination in the US nearly 20 years ago. They found that identical resumes with white and black sounding names, the white-seeming resume received 50% more callbacks. This applies to more than just names; Doleac and Stein found that differences in skin color drops the rate of sales in online markets by over half.

Discrimination is widespread in society. What do we do about this? Psychologists have found that we can make progress by taking judgment away from people. In in 2010, Sendhil would have thought that (a) humans are biased, and (b) the more rules we put in place, the more we can protect ourselves from ourselves. Algorithms would have seemed like a great way to solve this problem.

Next, Sendhil tells us about a recent projecy with Obermeyer Powers and Vogeli to study health systems designed to help potentially “heavy users” with chronic conditions. To provide this program, healthcare providers needed to find out who the heavy users would be. So the healthcare provider created an algorithm to identify people with the highest need.

Auditing algorithms: Sendhil Mullainathan at SPSP

Might this algorithm discriminate between groups? To find out, Sendhil and his colleagues compared people who got into the program compared to people who were just beneath the cutoff. At every given score level, African Americans are significantly sicker. Among the people auto-enrolled in the program, whites are much less sick than blacks. In order to support black people at similar levels, they would have to double the fraction of auto-enrolled black people.

Sendhil tells us about a rich set of research in computer science, including work by Latanya Sweeney, Cathy O’Neil, and a 2016 White House report that drew attention to the problem of algorithmic bias. If machines could also make biased decisions, maybe Sendhil’s 2010 thinking wasn’t right.

How to Compare Algorithmic Bias and Human Bias

If your goal is to improve the lives of disadvantaged people, how should he think about algorithms compared to humans?

If a firm comes to Sendhil asking if they should use an algorithm to review resumes? Could he tell them to go back to biased humans? To answer the question, Sendhil says, we need to find ways to compare human bias to algorithmic bias. He suggests three ways to compare them:

First, how can we detect bias? With humans, we typically have to set up audit studies that gives us statistical evidence of bias. While this information is helpful, it’s hard to pinpoint the source of the problem. With algorithms, Sendhil tells us that algorithms are easier to examine (with proper regulation and access to the algorithm), even though it’s still hard work.

Second, how do we understand the source of bias? With humans, the source of the bias is hard to uncover. With algorithms, it’s easier to uncover the problem. It turned out that the goal was to predict health, but the algorithm was designed to find expensive patients. That’s terrible, says Sendhil, but it’s easier to analyze the algorithm and understand.

Third, how can we fix the problem? Nearly 20 years after the audit study of humans, almost nothing has changed in labor market discrimination, despite the study being widely publicized. With algorithms, it’s easier to fix. The researchers developed a new statistical method, companies reached out to change it, and congress is considering new regulation. Overall, it’s easier to fix machines than change minds, Sendhil says.

Algorithms hold enormous potential for debiasing society. But machine bias and human bias can also combine to expand discrimination

(paraphrase of Sendhil Mullainathan)

Sendhil points out a strong counterargument. Because it’s very easy to scale algorithms, they can do a lot of harm before anyone even notices problems. The health system informed healthcare decisions for a hundred million people before anyone noticed the discrimination.

Overall, algorithms hold enormous potential for debiasing society. Despite this potential, Sendhil closes warns that it will be just as easy to create a world where machine bias and human bias combine to expand discrimination more widely.

Fairness in Machine Learning

How can psychologists collaborate with computer scientists on fairness in machine learning?

Speaking next is Moritz Hardt, an assistant professor in computer science at UC  Berkeley. His research aims to make the practice of machine learning more robust, reliable, and aligned with societal values.

Moritz Hardt speaking at SPSP Feb 2020
Moritz Hardt at SPSP 2020 speaking about fairness in Machine Learning

Moritz tells us about research in computer science on fairness in machine learning. Much of this research goes back to pioneering work in educational testing (Cleary 1968) and economics (Becker ‘57, Phelps ‘72, Arrow ’73). In 2010, fairness seemed like a fringe topic in computer science until 2016, when a the White House paper came out along other developments drew this topic to public attention. Moritz, who is a co-organizer of the FAACT community in computer science, introduces some key conversations in computer science. He reports some of the community’s major discoveries:

The Law of Unawareness. Taking sensitive data out of a model doesn’t make it more fair. For example, if your algorithm predicts demand for same-day shipping (as Amazon did in Boston), it might ignore marginalized communities.

Statistical Fairness Criteria. Computer scientists often discuss different quantitative fairness criteria grounded in different normative ideas of fairness. Common definitions include “demographic parity” (equal distribution of a risk score in all groups), “error rate parity” (equal distribution of false positives within groups), and “predictive parity” (your score should mean the same thing in all groups). 

These criteria have limitations. First, these criteria are mutually exclusive. If you think that more than one is reasonable, you can’t have them. Second, these are all about correlation rather than the outcomes caused by algorithm-guided interventions. Third, there’s a large argument about risk scores in the criminal justice system. When ProPublic pointed out that the COMPAS systems didn’t have error rate parity, the people behind COMPAS argued that the system was designed to provide predictive parity.

But why are we talking about prediction? The scholarly debate around  COMPAS was largely about tension between fairness criteria. Many pointed out that data and measurement biases were also a source of error (since policing patterns are also biased). But why should we focus on prediction at all?

Moritz tells us the story of failures to appear in court to illustrate the problem of prediction. Imagine that you created a model that predicts failure to appear in court. If people are failing to appear due to a lack of child care, transportation, work schedules, or too many court appointments, why not address those issues rather than create a predictive algorithm?

If prediction is too narrow, Moritz asks, how can we think about discrimination and fairness from a broader perspective? Some people ask what interventions yield long-term improvement. Others ask where the locus of discrimination is– between individuals and structures. Finally, how can we think about differences in contexts that might matter to the meaning of fairness? These questions are hard for computer scientists to think of, which creates opportunities for collaborations with social scientists.

Moritz concludes by telling us that many of these issues are also discussed in the upcoming book Fairness and Machine Learning (which is available for free online).

Why We Ignore Discrimination and Avoid Accountability

How do people arrive at mythologies of racial progress despite evidence to the contrary? And how do those mythologies affect people’s willingness to call for accountability?

Jennifer Richeson is is a professor of psychology at Yale University, where she studies psychological phenomena related to cultural diversity. Much of her recent research considers how people reason about and respond to societal inequality and injustice.

Jennifer Richeson describes myths of equality in the US

In the United States, Jennifer tells us, we have a mythic idea that across time, the country has become less biased, more equal, and more tolerant. We cite Martin Luther King and Barack Obama. We have a sense that this progress is naturally occurring, it’s linear, and that it’s going on its own, with little effort on our part.

Jennifer’s research has shown that this myth influences people’s perceptions of racial equality and what effort they think will be required to achieve equality. To find out, Jennifer and colleagues asked people to report what they think the wealth gap used to be, and what it is today. People are wrong about the past, and they are really wrong about today. And when we invoke the mythology of racial progress, people become even more wrong. 

When you ask people about racial wage gaps in the past compared to the present ( black wealth per $100 white wealth), people are wrong about the past. They’re very wrong about the present, says Jennifer Richeson.

Many people think “the children wills save us,” believing that as older people die out, attitudes will change. But past people believed that too, and views about race are common across generations. 

Another myth is that “structural racism is no longer a problem.” It might not be as visible as a sign in the door announcing segregation, but it’s still there. And most Americans think that interpersonal racism rather than structural racism is a problem.

Another myth is that “overt racism is a thing of the past.” People increasingly attribute differences in outcomes to implicit bias when there’s little to no evidence that these are problems.

What are the consequences of attributing discrimination to implicit bias? Working with collaborators, Jennifer found that when news stories attribute institutional problems to implicit bias, people expect doctors, police officers, and institutions to be held less accountable.

How do people think about bias by algorithms? In a recent study, Jennifer and colleagues replicated this study by adding a condition where the algorithm made a biased decision. People in the study called for accountability in lower proportions when they saw the algorithm- except in the case of the institution that deployed the system.

Is idea of solving racism through algorithms also a myth that could sustain racism? In our excitement to find systems that can be fixed, Jennifer says, we risk forgetting that millions of people can be harmed by these systems as we put them in the world (Sendhil nods his head vigorously here).

Jennifer concludes by urging us: We need to protect ourselves from the faith that algorithms will make things better without any evidence– that’s just wrong. Whether humans or algorithms, we need to be honest with ourselves about the problems, study them, and hold people accountable.