After 32 years as a program officer at the National Science Foundation (NSF), George Hazelrigg knows the rules governing peer review, especially the one that says researchers can’t be both an applicant and a reviewer in the same funding competition. Last year, however, he got permission to throw the rules out the window. His experiment, aimed at easing the strain on NSF staff and reviewers produced by a burgeoning number of proposals and declining success rates, not only allowed applicants to serve as reviewers, but it also required them to assess seven competing proposals in exchange for having their own application reviewed.
Some scientists might be horrified by such a “pay to play” system. But researchers in the engineering systems community responded enthusiastically, submitting 60% more proposals than usual by the 1 October deadline. A preliminary NSF evaluation concluded that the process, which used mail reviews rather than the in-person panels that are the norm at NSF, not only saved time and money but may also have improved the quality of the proposals and the reviews.
NSF is now considering whether to expand use of the offbeat approach, which is based in part on NSF-funded research into better voting and decision-making systems. In the meantime, some astronomers have already jumped on the bandwagon: Faced with a similar reviewing crunch, in January the Gemini Observatory will begin using a similar system to allocate observing time on its Hawaii telescope. “Finding good reviewers willing to spend the time is getting harder and harder,” says Rachel Mason, a Gemini astronomer in Hawaii who is coordinating the experiment, called Fast-Turnaround. “People also thought it would be kinda fun to have the chance to read their competitors’ proposals.”
The core problem is familiar to every science administrator. A system that relies upon the willingness of the scientific community to volunteer its time is being stretched to its limits as the number of applications goes up and the chances of success go down. NSF received 49,000 proposals last year, up 53% from 2001. Its budget didn’t keep up, meaning that success rates fell from 31% to 22% over the same period. Those trends have created two, related problems: The cost of peer review, in time and money, is rising at the same time more scientists are complaining about having to spend valuable time reviewing good ideas that have little chance of being funded.
Rather than wring his hands, however, Hazelrigg went looking for an alternative that avoided one easy answer, namely, limiting the number of submissions. “I didn’t want to put restrictions on the principal investigators,” he says.
Instead, Hazelrigg found inspiration in a 2009 paper in Astronomy & Geophysics, titled “Telescope time without tears: a distributed approach to peer review.” The paper was prompted, says co-author Michael Merrifield, an astronomer at the University of Nottingham in the United Kingdom, by a “bulging file of 113 applications” on his desk for observing time on instruments operated by the European Southern Observatory (ESO)—far more than he had the time or inclination to evaluate. The review system, he says, was “groaning at the seams.”
So Merrifield teamed up with mathematician Donald Saari of the University of California, Irvine, who has written extensively about voting systems, to suggest what Merrifield acknowledges is a “radical alternative.” The idea, rooted in mathematical game theory, is to alter the rules in ways that bring the competition closer to achieving its goals.
In NSF’s case, that meant distributing the evaluation workload more equitably and providing reviewers with a positive incentive to do a good job. The agency calls its approach mechanism design, and it begins by having grant applicants agree to review seven proposals submitted by their competitors. In addition to grading each one, using NSF’s five-point system from excellent to poor, they also ranked their set of proposals from best to worst. Hazelrigg says he chose seven “to discourage scientists from being frivolous” in submitting half-baked proposals, because each submission meant a commitment to doing seven reviews. At the same time, he felt that scientists would balk if he set the bar too high.
Hazelrigg, who heads NSF’s Civil, Mechanical and Manufacturing Innovation (CMMI) Division, says it took more than a year for the agency to approve the pilot. It was announced in May 2013 in a “Dear Colleague” letter to prospective applicants to a program funding research on sensors and sensing systems. Anyone wanting to submit to the October 2013 competition would have to abide by the rules, the letter said, but those who didn’t like the idea could simply wait until the next deadline, in mid-February.
The community’s initial reaction was generally positive, Hazelrigg recalls, but he knew the real test would be the tally of submissions. To his surprise and delight, NSF received 131 applications, some 50 more than the norm for a fall deadline.
The decision to participate in the experiment was a no-brainer for Rolf Mueller, a bioengineer at Virginia Polytechnic Institute and State University in Blacksburg. “I understood that I was agreeing to do a bunch of reviews, but that didn’t affect my decision,” he says. “And it was interesting to see some of the other proposals.” (NSF ultimately agreed to give him $360,000 over 3 years to apply aspects of a bat’s biosonar system to improve humanmade radar and sonar systems. Click here to see a video of a 3D reconstruction of a cave in Jinan, China, where Mueller is studying bats.)
Another applicant, electrical engineer Arash Takshi of the University of South Florida, Tampa, says the ability to see what his competitors were doing “filled a blind spot for me. Now I know that if I don’t get funded, it’s because of the quality of the other proposals, not something I did wrong.” (His proposal, to develop a more sensitive optical sensor using photosynthetic proteins rather than silicon-based elements, was also funded—his first NSF grant.) Takshi regards the estimated 30 hours he spent doing his required seven reviews as fulfilling part of his duties as an academic researcher.
NSF officials say they have a hunch the pilot led to “more comprehensive reviews.” Each proposal received seven reviews rather than the normal three or four, Hazelrigg notes. “And each review had, on average, 40% more words. I’m not saying that more is better, but we found the overall quality to be at least comparable” to reviews by panels, which review about 60% of all applications (see graphic, below). (Only one entrant, he notes, was disqualified, for failing to meet the 6-week deadline for submitting the reviews.)
One novel aspect of the pilot was its scoring system. Reviewers whose ranking of the seven proposals closely matched what the six other reviewers thought received bonus points that were applied to their own application. The idea was to reward reviewers for taking the job seriously and dissuade them from unfairly denigrating a competitor’s proposal in hopes of giving themselves a leg up. Using such a tactic would presumably prevent them from receiving a bonus because it would cause their ranking to be out of step with their colleagues.
Using applicants as reviewers also saved NSF time and money, Hazelrigg says. It takes a program manager 2 to 3 weeks to assemble an on-site review panel, he estimates, a process that starts with identifying some 400 potential reviewers before winnowing the group down to the typical 16- to 20-member panel. The use of mail reviews also meant that NSF didn’t need to provide travel and per diem expenses to bring those reviewers to NSF headquarters in Arlington, Virginia, to vet a stack of proposals. “Our division runs 200 panels a year,” he says, “so that’s a big cost savings.”
As with every merit review system, however, the pilot has some potential downsides. One NSF program manager who asked to remain anonymous worries that the bonus system could discourage innovative ideas that some reviewers might regard as poor bets. “It rewards people for playing it safe,” the program manager says, referring to how applicants might be reluctant to submit a disruptive idea that’s likely to get a mixed reaction from reviewers. “But it’s the outliers who are most likely to come up with the breakthrough.” Hazelrigg plays down that possibility, noting that program managers are not bound by the judgments of reviewers and have the flexibility to recommend a proposal for funding even if it doesn’t receive one of the top scores.
The skeptical program manager also worries that mail reviews make it impossible to hold a face-to-face discussion about the quality of both the proposed science and its broader impacts, the two criteria upon which every NSF proposal is judged. “We need that dialogue to explore all aspects of a proposal,” the manager says.
Mueller and Takshi, however, believe that personal interaction can also have a downside. “Having an argument is a good thing, but sometimes people who are more assertive can carry the day,” Mueller says.
NSF officials are still evaluating whether to expand the CMMI pilot, one of seven experiments the agency ran last year that tinkered with the normal merit review process. One option, to allow virtual reviews, turned out to be a real hit: Some 28% of all NSF panels last year met in cyberspace, a far cry from NSF’s goal of 5%. NSF officials suspect a crackdown on travel costs by the White House contributed to its popularity. Individual NSF programs also tested the impact on the number of applications by switching from two competition cycles per year to one or by accepting proposals at any time rather than setting a deadline. Another pilot offered reviewers the convenience of asynchronous discussions in cyberspace via a moderated message board.
The community’s reaction to such ideas will play a major role in whether NSF adopts any of the tweaks. One group of astronomers, however, has already embraced a version of the distributed reviewer concept detailed in the 2009 “tears” paper that also inspired Hazelrigg.
ESO did not adopt the scheme suggested by the authors, Merrifield and Saari. But after a senior ESO scientist, Markus Kissler-Patig, become director of the Gemini Observatory, an international consortium that operates twin 8-meter telescopes in Hawaii and Chile, he asked his staff to consider the approach. After much discussion, the observatory decided to use applicants as reviewers to allocate 10% of the viewing time on Hawaii’s Gemini North, starting in January.
“We could probably find a group of generous reviewers willing to donate their time,” says Gemini’s Mason. “But the problem is only going to get worse as the workload grows. And if it works, we can expand it to Gemini South.” Without such changes, she predicts, “the existing system is simply going to break down.”