How much time do you waste searching for other people's results? Do you spend hours in the library and on the Web, just finding the data you need? You won't when you're an e-scientist. You will simply sit at your computer and type in your question. Sophisticated new programming, called middleware, will search the world's databases to find existing research results relevant to your query. It'll know how accurate those data are likely to be, and if chunky calculations need to be done to work out the answer, it'll find spare computing capacity on supercomputers and PCs from Torquay to Timbuktu to crunch the numbers. Finally it'll spit the solution out back at your desk.
It might sound like science fiction, but e-science, or grid computing, looks set to fundamentally change the way that scientists work. Data and computing power will be shared in vast global 'collaboratories,' in which individual researchers won't necessarily know whose data their hypotheses are being tested against, or in which country a computer is running their calculations. And if that sounds a bit frightening to anyone who prefers their chips with salt and vinegar, worry not--the emphasis will be on usability, and no programming expertise will be required.
Grid computing is so called by analogy with the national power grid, explains Vince Osgood, IT and computer science programme manager at the Engineering and Physical Sciences Research Council (EPSRC). When you're using electricity, you simply stick your plug in the wall socket and don't care which power station the electricity comes from, or how it was generated. The UK government's 3 year, £118 million e-science programme, which was announced as part of last year's spending review, is all about building "the equivalent of the wall" into which you push your three-pin plug, Osgood says.
There are two major drivers of this new technology. First is the obvious interest that funders have in maximising the return on their research pounds. A grid should allow you to get "more science out for your investment than you would have before," according to Osgood. By increasing the connections between research data, and generally making it more accessible, discoveries should become more routine and less serendipitous--the results will work harder. But the onward march of science, and not mere economics, is the real reason for the intense interest in the field. All branches of science, from the Human Genome Project to the Large Hadron Collider (LHC), are simply producing more and more information, which has to be stored, manipulated, and made accessible to researchers.
The LHC, due to come on line at CERN in 2006, is the ultimate test of grid technology, according to Guy Rickett, e-science programme manager at the Particle Physics and Astronomy Research Council (PPARC). "Nobody has got a problem as big as this," he says. It's estimated that the LHC will generate five to eight petabytes of data a year (enough to fill a stack of CD-ROMs between 8 and 13 kilometres high) and that those data will be used by 6000 scientists around the world. And astrophysics too has numerical nightmares on the horizon. British astronomers are looking forward to the arrival of VISTA, a new telescope capable of making both visible and infrared sky surveys. "The scary thing is when you sit down and work out how fast the data is going to come off the back of this telescope," says Andy Lawrence from the Royal Observatory Edinburgh. Lawrence, who is involved in AstroGrid--one of the projects that PPARC is funding with its share of the spending review cash--estimates that the volume of data will be sufficient to fill up 50 PCs every night.
Each Research Council has been allocated money to fund grid projects specific to its own area of research, and a core programme, administered by EPSRC, is developing generic grid technologies. Big physics' greatest need was recognised in PPARC's allocation of £26 million, the largest single slice of the Office of Science and Technology's investment. A whopping £17 million of this is going to GridPP, a particle physics programme. Meanwhile EPSRC has just announced the six 'test bed' projects that will share its £17 million quota.
Particle physics may produce vast quantities of data, but it's very uniform. By contrast, "with biology, the data is extremely complicated," says Carole Goble of the EPSRC-funded 'Mygrid' project, because it comes in "many different forms"--images, sequences, numbers, and lots of descriptive text. Goble will be working with a team of fellow computer scientists from Manchester, Newcastle, Sheffield, and Nottingham, along with bioinformaticians in Manchester and the European Bioinformatics Institute, to develop a grid for biological scientists. At the moment, even asking fairly straightforward questions requires a biologist to spend hours in front of a screen, harvesting databases and finding the tools needed to gather evidence. It's a skilled job, she explains, because you need to know where the databases are, how they work (each does so differently, of course!), how the data were derived, and whether you can trust them. In the 'collaboratory' that Mygrid aims to build, it will be simple for your average biologist to build 'workflows' that will find the data to check hypotheses and answer questions. Better yet, the software will also keep an eye on the databases as they are revised, to ensure that past query results are updated to reflect the latest knowledge.
This empowerment of ordinary scientists to make full use of IT without in-depth training is something that Lawrence also expects to see in the 'Virtual Observatory', which AstroGrid is creating. By making all the "key data sets" generated by observatories all over the world available through a single system, it will be possible to "observe the sky from your desk," he says. Moreover, it will mean that anybody will be able to do "specialised analyses that were previously the domain of experts." "Within a few years," predicts Lawrence, "it will be easy for anybody to do new styles of science, because it will just be easier to crunch huge amounts of data."
Jeremy Frey is also hoping to see new types of science develop as the result of grid computing. A Senior Lecturer in physical chemistry at the University of Southampton, Frey is one of the researchers involved in an EPSRC-funded project focussing on the application of e-science to combinatorial chemistry. In particular, he wants to see the 'Structure-Property Mapping' consortium's work toward developing an electronic lab book stretch well beyond one small field. "E-chemistry is a different way of collaborative working," he says, and "if it manages to stitch together different branches of chemistry where different approaches are being taken, it really will have achieved something."
It might begin to sound as though research is set to become so straightforward that it will no longer present any kind of intellectual challenge. Not so, insist those at the forefront of the technology. "You cannot beat bench science," asserts Goble, who says that e-science will simply aid the discovery process. According to Osgood, the point is to "allow scientists to use their full range of skills, but to invest more intelligence on the computer." And Frey maintains that "a good knowledge of chemistry and the ability to solve problems" will continue to be the most important elements in the chemist's toolkit.
Although initially led by academia, industry is also getting in on the grid act. Unsurprisingly, computing giants Sun Microsystems and IBM are collaborators on several of the EPSRC-funded projects. But pharmaceutical companies, too, are showing an interest. Given the highly confidential nature of industrial research, it seems surprising that business is so keen to be involved in something based so squarely on the sharing of data. However, "they use the public databases as much as anybody else," points out Goble, and "the real problem is that [much of] the data is bad." One of the aims of her research is supporting the curation of databases and improving the quality of the data, hence big pharma's interest. In addition, the team want to produce a 'Mygrid in a box' developers kit, which will allow anyone to set up a grid in the same way that you might start a Web site. Companies have an interest in setting up in-house grids to "share corporate knowledge" she suggests. In the future, says Frey, it may even be possible to query a drug company's data through the grid and get the answer, without the firm ever having to disclose commercially sensitive information.
Many of the new jobs arising from grid technology will need the specialised skills of an out-and-out computer scientist, and some of the best opportunities are with the core e-science programme, which is, for example, offering 3-year computer science fellowships. But there will certainly be "new jobs in the applications end for scientists interested in grid computing," according to Rickett. At this stage, Lawrence says that many of the new developments are coming from "programmers who come out of a science background, rather than computer professionals." And Goble says that on the Mygrid project at least two of the researchers they are planning to employ are biologists who are moving into computer science. Given that the grid, and investment in it, look set to continue to grow apace, according to Rickett, "this is the area to be developing skills."