Science thrives when ideas, hypotheses, data, and knowledge are quickly and easily shared within disciplines and communities. Traditionally, this was done through personal interactions, letters, lectures, and articles in professional journals. The advent of the Internet and Web accelerated information sharing with tools such as e-mail, online publishing, digital libraries, and comprehensive search engines such as Google. Researchers and developers are now exploring a new idea that many believe will further enhance scientists' ability to share knowledge: the Semantic Web.
What (and Why) Is the Semantic Web?
The Semantic Web idea emerged from the confluence of several communities--artificial intelligence, hypertext, Web developers--so there are a number of ways to appreciate its motivation and goals. Perhaps the easiest for one who does not belong to any of those communities is to consider that much of what we want to know (that is actually known) is available on the Web. Thus the Web is, potentially, a great resource for software agents, which can be programmed to extract and fuse information from multiple, heterogeneous sources in response to a query.
However, extracting meaning from text is a very challenging task for computer programs. Although progress is being made, a robust solution is decades, if not generations, away. So, the Semantic Web is an approach to encoding and publishing information in ways that make it easier for computers to understand, thus making the Web agent-friendly. What do we mean by "making it easier for computers to understand?" On the Semantic Web we mean: through recourse to ontologies, formal descriptions of particular domains.
What Is an Ontology?
Ontology is the branch of philosophy that seeks to answer the question "What is there?" In computer science, an ontology is a formal conceptualization of a domain. Typically, it specifies the classes of objects that exist, the relations among those classes, the possible relations among instances of the classes, and constraints over those instances. An ontology also defines terms denoting these classes and relations as well as individual objects. Current Web ontology languages, designed to encode information on and for the Web, use the eXtensible Markup Language (XML) both for specifying ontologies and also for making assertions about the world using terms defined in ontologies. A Semantic Web page begins by listing (as URLs) the locations of the ontologies to be used, then goes on to use those ontologies to make assertions about data sets, human beings, items for sale, etc. An agent, on coming to such a page, can import the specified ontologies and use that information to understand the semantics of the ensuing assertions.
Ontologies on the Web
The World Wide Web Consortium ( W3C) has developed standards to enable ontologies to be published on the Web as well as data and other assertions to be encoded using terms drawn from any published ontologies. These standards make it possible for programs and software agents to understand information published on the Web without the ambiguity and complex processing inherent in traditional unstructured forms (e.g., natural language) or rigidity and lack of flexibility inherent in structured representations (e.g., relational databases.)
The Resource Description Framework ( RDF) is a simple XML-based language to define computer-understandable vocabularies that people and programs can use to describe things of interest, such as Web sites, newspaper articles, e-mail messages, people, books, events, or Web services. RDF mimics human languages in that it allows one to introduce new terms (individuals, classes, and properties) that are defined (partially, at least) in terms of existing terms.
The RDF-based Web Ontology Language ( OWL) supports advanced capabilities, such as logical inference and translating descriptions using different ontologies. Examples of these capabilities include describing a food chain in terms of the hierarchy of organisms in an ecosystem and mapping a location specified as a ZIP code to one using latitude and longitude.
Figure 1 shows portions of an OWL document that uses terms defined in a number of ontologies to describe a food web: a model of which organisms eat which in a specified ecosystem.
Figure 1. Excerpts from food web ontology
Click on graphic for full-size image.
Many Worlds, Many Ontologies
A problem in the effort to formalize (or "ontologize") scientific domains is that there are typically many different ways of doing so. Within a single discipline, there can be disagreement about how to describe the world. Also, scientific disciplines overlap and often look at the overlapping area from different points of view. One approach to the ontology heterogeneity problem is to create a global schema to serve as an interlingua for human and software agents. In our Semantic Prototype in Research Ecoinformatics ( SPIRE) project, we are pursuing another approach, more illustrative of the spirit of the Semantic Web. We do not believe that the conceptual space occupied by any domain of active, interdisciplinary research can be captured by a single, consistent ontology. So the construction of a global schema is not our goal.
Instead, we envision and are encouraging the development of a number of relatively small ontologies, some of which may overlap, and some of which may be in conflict. Semantic mediation is achieved through the construction of binary mappings among relevant ontologies. We expect that a future feature of online ontology repositories (e.g., Open Biological Ontologies, discussed previously in Next Wave) will be a facility to make assertions about relations among concepts appearing in different ontologies. We are already experimenting with this facility in SPIRE.
Although it is an open question whether this approach will work for the Web as a whole (due to the vast number of micromodels that people use to express themselves), there is strong reason to believe it will work in scientific domains, where there is typically general agreement on an underlying paradigm, coupled with a small number of formalizations of various aspects of that paradigm. Even in cases without a dominant paradigm, there are often a small number of schools of thought in which research is grounded. We expect to be able to report results from this approach in the biocomplexity and biodiversity domains in the next year or so.
Other Ingredients of the Semantic Web
Information encoded in Semantic Web languages is only a part of the ecology of the Semantic Web. Other components include search engines capable of finding Semantic Web content (e.g., Swoogle and the W3C's Ontaria); parsers and reasoning engines capable of understanding Semantic Web pages and drawing conclusions from assertions made within them (e.g., Jena, a Semantic Web Framework for the Java programming language, and RDFStore for creation and management of RDF models); and agents that take advantage of Semantic Web infrastructure to perform useful tasks.
SemWebCentral and SemanticWeb.org provide good listings of Semantic Web tools either available or in production. The construction of tools and systems to discover, interpret, integrate, evaluate, and respond to information on the Semantic Web is an active research area with much work remaining to be done.
Lend a Hand, Win a T-shirt
One of our primary goals in the SPIRE project is to facilitate the wide-scale publishing of scientific data in RDF and OWL. We want to make it easy to automatically generate semantic markup from local databases. Because the most time-consuming part of the Semantic Web publishing process is ontology development, we are promoting a casual ontology development process that enables simple, skeletal ontologies to be quickly crafted for a specific publishing purpose. We invite readers to select a data set (or other source of information) that they would like to share; to use Swoogle to search for ontologies that would allow them to markup the data set in OWL, and to create an appropriate ontology if one doesn't exist; and to submit to us your marked-up data set. We are very interested to see the different ways that similar domains are ontologized and will publish all ontologies, together with mappings among related and identical concepts. For more detailed instructions, see the SPIRE methodology page. The first 10 submissions will receive a SPIRE T-shirt and the satisfaction of participating in an important experiment. For guidance in the ontology development or markup process, feel free to contact Joel Sachs.
How Are Scientists Responding?
Among the first adopters and co-creators of the Semantic Web have been scientists who see in the new approach the promise of content-based information retrieval, distributed data mining, and automated Web-service choreography. Another Semantic Web application for scientists is an annotation framework that would pull the work of the citizen scientist (the dedicated amateur who makes potentially important observations in, e.g., astronomy or ecology) into mainstream scientific discourse.
In last year's Semantic Web Challenge, one of the prizes was awarded to AnnoTerra, a system that extracts key words from NASA Earth Observatory news feeds and uses them to perform ontology-based searches on the Global Change Master Directory for relevant resources. In other words, it serves up relevant scientific data with the latest news. Other examples of research projects exploring the use of the Semantic Web in science include SPIRE, Science Environment for Ecological Knowledge, Semantic Web for Earth and Environmental Terminology, and the Semantic Grid project.
Some observers believe that the Semantic Web idea is oversold and is unlikely to be realized anytime soon. Of course, it's well known that it's difficult to make predictions, especially about the future, but we believe the Semantic Web will have an important impact and see signs that it is already being adopted. Adobe's products use RDF to encode and manipulate metadata in various file formats such as PDF documents and images. We have built a prototype indexing and retrieval engine, Swoogle, that crawls the Web looking for Semantic Web documents, and find that their number is growing. Two simple ontologies, RDF Site Summary and the Friend of a Friend project, are being widely used and by millions of Web pages.