With the world’s fastest computers capable of processing quadrillions of calculations per second and the amount of data researchers generate growing by an order of magnitude each year, every field of science is turning to supercomputers to do new, innovative, data-intensive things: analyze genomes, simulate the evolution of dark matter, visualize the structure of proteins, and on and on. Supercomputers also offer a number of time- and money-saving opportunities for companies, which increasingly rely on these powerful machines to design everything from jets to sudsier laundry detergent.
A background in computational science is becoming especially vital for researchers working in genomics and other fields where vast quantities of data are collected and analysis by supercomputer is necessary.
Yet, there is a significant barrier for companies that want to take advantage of these new machines: a lack of hardware engineers, software developers, and scientists with high-performance computing skills.
Here are a few universities that offer educational opportunities for domain scientists who wish to acquire high-performance computing training:
• The University of Oklahoma Supercomputing Center for Education & Outreach helps facilitate interdisciplinary computational science degrees.
• The University of Southern California offers an M.S. degree in computer science specializing in high-performance computing.
• Pennsylvania State University offers an interdisciplinary graduate minor in computer science.
• Indiana University, Bloomington, offers a graduate minor in scientific computing with an emphasis on high-performance computing.
• Dakota State University in Madison, South Dakota, offers undergraduates a minor in high-performance computing.
“We have a tremendous need to really develop expertise at a variety of levels, whether it’s computational scientists who solve particular problems or people looking at software development,” says Alan Blatecky, director of the National Science Foundation’s Office of Cyberinfrastructure. “The capabilities we have at our fingertips are growing rapidly, but we haven’t been growing the same capability in terms of people to take advantage of it.”
As researchers produce more and more data to crunch, national labs and university-affiliated supercomputer centers are expanding and building new supercomputers, which need more and more computer scientists with high-performance computing skills to program and operate them. While precise job descriptions vary widely, these institutions are looking for professionals who can train other computer scientists and collaborate with research scientists to develop modeling programs and software. They also need diagnostic experts who can, for example, recognize problems in a researcher’s buggy code or determine if aberrant data are a result of computer hardware malfunctions, says James Ferguson, director of education, outreach, and training at the National Institute for Computational Sciences at the University of Tennessee.
"We are certainly having trouble finding people with the appropriate skills,” says William Gropp, a professor of computer science at the University of Illinois, Urbana-Champaign, which is installing a new supercomputer called Blue Waters. “Everyone that I’ve spoken to has said that hiring is a problem.”
According to the most recent Taulbee Survey of the Computing Research Association, of the more than 1300 new computer science Ph.D. graduates who found jobs in North America in 2010, fewer than 2% pursued careers in high-performance computing.
Ph.D. graduates with supercomputer skills often receive multiple job offers, sometimes from different divisions within the same company, says Henry Neeman, director of the University of Oklahoma Supercomputing Center for Education & Research. Those who land private industry jobs can expect to make low-six-figure salaries within a few years.
Reprogramming computer scientists
So why aren't computer scientists pouring into this lucrative field? The heart of the problem seems to be that traditional computer science training doesn't prepare students to tackle the complexities of supercomputing, Neeman says. In many ways, a supercomputer is no different from the laptop on your desk or the smartphone in your pocket. It relies on computer processing units (CPUs) to execute program instructions and on memory to store data. “All of the skills you would use to design a regular computer or to develop software for a regular computer, you do need to design a supercomputer or to develop software for a supercomputer,” Neeman says. “But you need a lot of additional skills as well. Many of those skills are not taught in traditional computer science curricula.”
Chief among these skills is being able to deal with the very quality that makes a computer super: its size. Traditionally, computers have grown more powerful as CPUs have gotten faster. But that rate of progress has slowed; processing units today aren't much faster than they were a couple of years ago. “The only way to get more performance now is to divide the work up into smaller pieces and have it done concurrently, or in parallel,” Gropp says. To do this, engineers place multiple processors, or cores, in one CPU.
The world’s biggest supercomputer, Japan’s K computer, has tens of thousands of CPUs and more than 705,000 cores. It's like several-hundred-thousand brains all focused on accomplishing the same goal. Getting those brains to work together in the most efficient manner requires specialized skills that typical computer scientists don't have. “Simply converting old codes for new architectures isn’t working,” says Robert Panoff, founder and executive director of the Shodor Education Foundation, a nonprofit computational science education organization based in Durham, North Carolina.
Massively parallel computing requires a new approach. “For many years we have thought serially; one thing at a time, one thing after the other,” says Charles Peck, an associate professor of computer science at Earlham College in Richmond, Indiana. The challenge now is to figure out the most efficient way to divvy up work to get a job done in the fastest way possible.
Computer scientists working at the cutting edge of hardware and software performance also need another quality: flexibility. Future speedups aren't likely to come from new advances in CPU design but from completely new kinds of hardware, Neeman says, so computer scientists working on supercomputers need to be able to quickly understand and incorporate new, unexpected technologies.
Computer science education has been slow to keep up with supercomputer advances. “The capability of the hardware is moving much more quickly than the material which we teach,” Peck says. University departments are struggling to fit new advances into an already packed curriculum, he says. New material on supercomputing is integrated into existing classes if it is taught at all.
Most people who learn these skills learn them on the job. But enterprising computer scientists looking to break into the supercomputing industry can also learn them on their own. The University of Oklahoma Supercomputing Center offers an online workshop series called Supercomputers in Plain English, which is for undergraduate students, graduate students, and professionals interested in learning supercomputing basics. The last three times the workshop was offered, a few hundred people from 165 institutions around the world took the course -- a sign, Neeman says, of the “hunger” people have to understand supercomputing.
Creating supercomputational scientists
Computer scientists aren’t the only ones who need a reboot. As supercomputers spread to what scientific computing experts call the domain sciences -- biology, chemistry, geology, physics, and so on -- scientists who work in these areas need to know how to incorporate the powerful machines into their research. “They have to understand the capabilities and limitations of the computing device,” Peck says, “and they have to have a sense of whether or not a computational approach is reasonable.”
Computational modeling isn’t new, but supercomputers offer ways to build more realistic models. Most systems -- whether it’s organs in the human body or stars in a galaxy -- are parallel, Panoff says. Scientists who want to develop more advanced simulations to expand their research and to address more complex questions need to become more familiar with computational thinking, he says.
Getting that education can be difficult. Most universities don't have departments that can teach students the mix of math, domain science, and computer science needed to develop sophisticated science models and run analyses on supercomputers. Faculty members with expertise in these areas are often spread across departments and colleges, making it difficult to provide comprehensive computational training and recruit new students, Ferguson says. There are, however, a few universities where domain scientists can earn graduate degrees, minors, or certificates in computational science.
A background in computational science is becoming especially vital for researchers working in genomics and other fields where vast quantities of data are collected and analysis by supercomputer is necessary, Neeman says. Computational science skills also open up more job opportunities for research scientists, allowing them to switch between jobs more easily, Peck says. Neeman agrees: “One thing I’ve observed is that computational science experience within a specific discipline better prepares a person for computational science work in other disciplines.”
People in the field hope that as more people learn about the demand for computer scientists with high-performance computing skills, and scientists with a computational background, interest among students will grow. “There’s tremendous opportunity here to solve the problems society faces,” Gropp says. Also, “It’s a lot of fun to tackle problems that are too hard for anyone else.”
Erin Wayman is a journalist in Washington, D.C.