M. Henzinger does a good job of reviewing the current state of search engine technology (Reviews, 27 July 2007, p. 468). In particular, she accurately describes the weaknesses of web search engines, including the homonym problem, the synonym problem, the spam problem, and the problem of search engines being unable to use reasoning, like humans do, to improve search results.
Because the article might leave the incorrect impression that web search engines, with all their flaws, represent the best and most modern search technology, it is important to point out and describe a search technology that is able to overcome the problems of low search precision and low recall of web search engines.
This technology is called deterministic search, which is a type of search that uses one-to-one matching between a search query and metadata (including cross references) that function as surrogates for information resources. This is the type of search that is done in an online library catalog.
Cross references in deterministic search systems solve the synonym problem by using a controlled vocabulary (such as the Library of Congress Subject Headings) that select one term for each concept and provide cross references from variant terms. For example, a search on "false teeth" leads to a cross reference that says "see Dentures." Deterministic search also solves the homonym problem by creating unique labels for concepts with the same name, as in "Jaguar (Animal)" and "Jaguar (Automobile)."
Deterministic search systems are created by trained humans (like library catalogers) who apply rules and reasoning to create search platforms that are precise and provide high recall. These systems are an excellent (and yes, more expensive) alternative to stochastic search systems, such as web search engines.
Jeffrey Beall
Librarian,
University of Colorado at Denver and Health Sciences Center, Denver, CO 80262, USA.