An unusual experiment to test whether an applicant’s apparent race or gender influence how their grant proposal is scored has found no evidence of bias.
The study, which involved re-evaluating proposals already funded by the National Institutes of Health (NIH) in Bethesda, Maryland, after the applicants’ names had been altered, is part of an ongoing effort by NIH officials to detect bias in their vaunted peer-review process. Some scientists who have read the NIH-funded study, described in a preprint posted online on 25 May, object to what they say is its implication that bias doesn’t exist. But the authors say they are making no such claim.
“I’ve made a career out of studying bias and how to overcome it. I know the problem to be real. But here in this particular context, it may not be the place where the bias shows itself,” says psychologist Patricia Devine of the University of Wisconsin (UW) in Madison, who led the study.
The impetus for the project was an explosive 2011 study led by economist Donna Ginther of The University of Kansas in Lawrence. It found that black applicants have a success rate 10% lower than whites in winning NIH research grants even after accounting for an applicant’s institution and research record. NIH responded to the disturbing finding by pouring money into programs such as a mentoring network for minority researchers and funding research on unconscious racial bias during grant review, including Devine’s study.
Working with UW graduate student (later postdoc) Patrick Forscher and others, Devine adapted an approach previously used to study bias in situations such as hiring. The team made a list of first names for black and white American babies that were popular several decades ago (to square with the current age of a typical NIH-funded researcher) and added common black or white surnames—Tyrone Jackson for a black man, Greg Sullivan for a white man, for example. They then substituted the fake name for the original investigator’s name throughout 48 NIH bread-and-butter R01 research grants that were funded in 2012, creating five different versions of each proposal. Two listed a presumably white man as principal investigator (PI), and three contained the names of an apparent white woman, a black man, or a black woman.
The UW team then recruited scientists with relevant expertise to review the proposals. The reviewers were paid $300 and asked to apply standard NIH criteria in their assessment, which they were told would not be counted toward make funding decisions.
Devine’s team didn’t want to deviate too far from the demographics of a typical NIH applicant pool. So each reviewer received two proposals from fictitious white men; the third proposal bore the name of an imaginary white woman, black man, or black woman. To protect their secret, the researchers told reviewers not to look up references. Even so, 34 of the 446 reviewers admitted to doing so and then realizing the investigator’s name was fake; they were dropped from the study.
Devine says the results were clear: Applicants’ average overall impact scores, determined on a scale of one to nine, differed by less than a quarter of a point for white women, black men, and black women compared with scores for white men. The pattern held for specific research topics, grants of varying quality, and whether the reviewer was a white man. “We just didn’t find any evidence that when you randomly assign race and gender names to identical proposals that there was any bias favoring the white male PI,” Devine says.
That’s not to say there’s no bias in NIH grantmaking, she adds. But if it exists, it must crop up at some other point in the process. One possibility is the relative effectiveness of the training that an applicant receives in how to write a persuasive proposal. Another is when reviewers meet to discuss and give final scores to each application.
Whereas some researchers praised the study on Twitter, others panned it. A blogger known as Drugmonkey argued that reviewers had likely figured out the names were fake and deliberately gave their applications good scores to “show that they are not a bigot.” Devine questions that explanation, saying such behavior would likely generate higher scores for proposals with female and black names, which wasn’t the case.
Some have also questioned Devine’s decision to use only funded proposals, saying it fails to explore whether reviewers might show bias when judging lower quality proposals. But she and Forscher point out that half of the 48 proposals were initial submissions that were relatively weak in quality and only received funding after revisions, including four that were of too low quality to be scored.
Raynard Kington, a former NIH deputy director and a co-author of the Ginther study, says the UW team’s strategy was “reasonable.” But changing only the names leaves out other information that could influence a reviewer to discriminate against a black investigator, such as where they were trained. “A name is just one factor among many ways in which your race and gender are embedded in everything you do,” says Kington, who is now president of Grinnell College in Iowa.
NIH officials declined to comment on the unpublished study. But Noni Byrnes, acting director of NIH’s Center for Scientific Review, which manages peer reviews for most grants, noted that her center is funding another experiment to assess bias. It involves re-evaluating 1200 grant proposals, including some that were not funded, after they have been stripped of all personally identifiable information. The project is now recruiting reviewers, who she hopes won’t have heard too much about the study in advance. “We don’t want the results to be skewed in any way,” Byrnes says.
*Update, 1 June, 11:05 a.m.: This article has been updated to include a response from Devine’s team to one criticism of their study.