Computers learning from human writing automatically see certain occupational words as masculine and others as feminine.

Benedetto Cristofani/@Salzmanart

Even artificial intelligence can acquire biases against race and gender

One of the great promises of artificial intelligence (AI) is a world free of petty human biases. Hiring by algorithm would give men and women an equal chance at work, the thinking goes, and predicting criminal behavior with big data would sidestep racial prejudice in policing. But a new study shows that computers can be biased as well, especially when they learn from us. When algorithms glean the meaning of words by gobbling up lots of human-written text, they adopt stereotypes very similar to our own.

“Don’t think that AI is some fairy godmother,” says study co-author Joanna Bryson, a computer scientist at the University of Bath in the United Kingdom and Princeton University. “AI is just an extension of our existing culture.”

The work was inspired by a psychological tool called the implicit association test, or IAT. In the IAT, words flash on a computer screen, and the speed at which people react to them indicates subconscious associations. Both black and white Americans, for example, are faster at associating names like “Brad” and “Courtney” with words like “happy” and “sunrise,” and names like “Leroy” and “Latisha” with words like “hatred” and “vomit” than vice versa.

To test for similar bias in the “minds” of machines, Bryson and colleagues developed a word-embedding association test (WEAT). They started with an established set of “word embeddings,” basically a computer’s definition of a word, based on the contexts in which the word usually appears. So “ice” and “steam” have similar embeddings, because both often appear within a few words of “water” and rarely with, say, “fashion.” But to a computer an embedding is represented as a string of numbers, not a definition that humans can intuitively understand. Researchers at Stanford University generated the embeddings used in the current paper by analyzing hundreds of billions of words on the internet.

Instead of measuring human reaction time, the WEAT computes the similarity between those strings of numbers. Using it, Bryson’s team found that the embeddings for names like “Brett” and “Allison” were more similar to those for positive words including love and laughter, and those for names like “Alonzo” and “Shaniqua” were more similar to negative words like “cancer” and “failure.” To the computer, bias was baked into the words.

IATs have also shown that, on average, Americans associate men with work, math, and science, and women with family and the arts. And young people are generally considered more pleasant than old people. All of these associations were found with the WEAT. The program also inferred that flowers were more pleasant than insects and musical instruments were more pleasant than weapons, using the same technique to measure the similarity of their embeddings to those of positive and negative words.

The researchers then developed a word-embedding factual association test, or WEFAT. The test determines how strongly words are associated with other words, and then compares the strength of those associations to facts in the real world. For example, it looked at how closely related the embeddings for words like “hygienist” and “librarian” were to those of words like “female” and “woman.” For each profession, it then compared this computer-generated gender association measure to the actual percentage of women in that occupation. The results were very highly correlated. So embeddings can encode everything from common sentiments about flowers to racial and gender biases and even facts about the labor force, the team reports today in Science.

“It’s kind of cool that these algorithms discovered these,” says Tolga Bolukbasi, a computer scientist at Boston University who concurrently conducted similar work with similar results. “When you’re training these word embeddings, you never actually specify these labels.” What’s not cool is how prejudiced embeddings might be deployed—when sorting résumés or loan applications, say. For example, if a computer searching résumés for computer programmers associates “programmer” with men, mens’ résumés will pop to the top. Bolukbasi's work focuses on ways to “debias” embeddings—that is, removing unwanted associations from them.

Bryson has another take. Instead of debiasing embeddings, essentially throwing away information, she prefers adding an extra layer of human or computer judgement to decide how or whether to act on such biases. In the case of hiring programmers, you might decide to set gender quotas.

People have long suggested that meaning could plausibly be extracted through word cooccurrences, “but it was a far from a foregone conclusion,” says Anthony Greenwald, a psychologist at the University of Washington in Seattle who developed the IAT in 1998 and wrote a commentary on the WEAT paper for this week’s issue of Science. He says he expected that writing—the basis of the WEAT measurements—would better reflect explicit attitudes than implicit biases. But instead, the WEAT embeddings more closely resemble IAT biases than surveys about racial and gender attitudes, suggesting that we may convey prejudice through language in ways we don’t realize. “That was a bit surprising,” he says. He also says the WEAT might be used to test for implicit bias in past eras by testing word embeddings derived from, say, books written in the 1800s.

In the meantime, Byron and her colleagues have also shown that even Google is not immune to bias. The company’s translation software converts gender-neutral pronouns from several languages into “he” when talking about a doctor, and “she” when talking about a nurse.

All of this work “shows that it is important how you choose your words,” Bryson says. “To me, this is actually a vindication of political correctness and affirmative action and all these things. Now, I see how important it is.”