The National Institutes of Health is trying to mask the identity of applicants in an experiment exploring racial bias in grant reviews.

派脆客 Lee/Flickr (CC BY-NC-ND 2.0)

NIH finds using anonymous proposals to test for bias is harder than it looks

An effort to test whether reviewers are biased against blacks applying for grants from the National Institutes of Health (NIH) is proving to be much harder to carry out than expected.

A 2011 study commissioned by NIH concluded that black applicants were 35% less likely than white researchers to snare a grant. That disturbing finding has prompted NIH to invest $250 million in increasing the diversity of the biomedical research community as well as to look inward at possible bias in its vaunted grantsmaking system.

NIH’s review process begins with an initial assessment of proposals by a handful of outside scientists. Those proposals contain lots of information on the applicant as well as the idea being proposed. But later this year a fresh set of reviewers will be asked to rate a pool of 1200 previously submitted proposals that have been stripped of all personal identifiers, including name, institution, where they were trained, and even their collaborators.

The novel exercise, a year in the making, is so sensitive that NIH is contracting out the work to avoid even the appearance of a conflict of interest in the outcome. There have also been technical challenges in preserving the quality of grant proposals that have been scrubbed of all identifiers.

“It’s taken so long because it turned out to be much harder than I thought,” says Richard Nakamura, head of NIH’s Center for Scientific Review in Bethesda, Maryland, that is funding the new study. “We wanted to keep the sense of the science, but to do that we had to add some dummy code and other information that would help reviewers understand what was being proposed without disclosing anything about the individual applicant.” For example, NIH decided that even geographic references were a no-no.

A heavy lift

Some researchers who have studied the inner working of the peer-review process question whether the results will actually provide any useful information for closing the racial disparity.

“I don’t think anonymization will work, but it’s the first thing that people think of,” says Molly Carnes, a professor of geriatrics and director of the Center for Women’s Health Research at the University of Wisconsin in Madison. Carnes leads a team that has poked at the dynamics of peer review by recreating study sections. Among their findings is that ambiguous standards for reviewing grant proposals and comments from other reviewers can influence the panel’s assessment of the proposed research. Those variations could also lead to bias, she says, although the group has not specifically examined racial factors.

Carnes, who serves on an NIH advisory body that Nakamura briefed last week on the planned study, also speculates that reviewers won’t be comfortable not knowing anything about who would be carrying out the research. “Scientists have a relentless need to categorize,” she notes. “So I suspect they will not rest easy until they know more about the applicant. Even if they have nothing else to go on, they might Google the science described in the proposal for clues about where it’s coming from.”

Scientists use that information to help them assess the researcher’s chances of success, adds Elizabeth Pier, a postdoc at the center Carnes runs. “That’s just how reviewers have always operated, and any change would require a paradigm shift.”

The study will compare 400 black and white applicants, matched by research topic, gender, degree, type of institution, and original score. An additional 400 white applicants will be chosen at random. The pool includes both proposals eventually funded and those that were declined. The proposals were originally submitted in 2014–15, but none remains under active consideration.

The study will be done by Social Solutions International (SSI), a small, minority-owned company in Rockville, Maryland. SSI hopes to do a preliminary study this summer before launching the full effort in the fall.

Nakamura decided to focus on the first step of NIH’s normal review process, in which every application chosen for review is critiqued by three reviewers and assigned a preliminary score. An analysis of the 2011 data, he notes, found that the discrepancy between black and white applicants was “due entirely to their lower preliminary score.” The analysis, he notes, found no indication of bias during the second step—discussion of each application by the study section—nor in the final stage when the proposal is approved for funding by the advisory body for the relevant NIH institute or center.

In the experiment, each of the new reviewers will assess six to eight anonymized proposals. Those results will then be compared with the scores given to the same proposals by the original reviewers to see whether there are differences according to race.

NIH officials will also be looking at whether the anonymization affects the current distribution of scores by gender, stage of career, or type of institution. Applications from Asian-Americans and Hispanic scientists have been excluded, Nakamura says, because the 2011 study found that they fared only marginally worse than their white counterparts.

Worries about design

Even so, Carnes’s team believes there is ample opportunity for bias in that second stage. Anna Kaatz, director of computational sciences at the center, says one problem arises when members disagree on the quality of a specific proposal. The ensuring discussion—their papers call it “score calibration talk”—can lead to group think that erases minority views.

A second potential problem is substituting the characteristics of the applicant for the quality of the proposal. “A comment like ‘These guys have published in Nature and Science and Cell, so they must know what they are doing,’ can discriminate against someone with a great idea but who lacks the right pedigree,” Kaatz says.

A lot of work has been done on how these subtle but powerful biases can affect decision-making, says Carnes, who’d like to see NIH make use of it in training its reviewers and professional staff. Nakamura says he’d welcome any information on detecting bias, adding that he hasn’t seen any studies specifically on racial disparities in the peer-review process.

Despite her concerns, Carnes applauds NIH for taking what she calls an important first step. “I think it’s wonderful that NIH is willing to shine a light on its own processes,” she says. “It’s the right thing to do, and it needs to be done.”