reCAPTCHA: Human-Based Character Recognition via Web Security Measures
Luis von Ahn,*
Benjamin Maurer,
Colin McMillen,
David Abraham,
Manuel Blum
CAPTCHAs (Completely Automated Public Turing test to tell Computers
and Humans Apart) are widespread security measures on the World
Wide Web that prevent automated programs from abusing online
services. They do so by asking humans to perform a task that
computers cannot yet perform, such as deciphering distorted
characters. Our research explored whether such human effort
can be channeled into a useful purpose: helping to digitize
old printed material by asking users to decipher scanned words
from books that computerized optical character recognition failed
to recognize. We showed that this method can transcribe text
with a word accuracy exceeding 99%, matching the guarantee of
professional human transcribers. Our apparatus is deployed in
more than 40,000 Web sites and has transcribed over 440 million
words.
Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.
* To whom correspondence should be addressed. E-mail: biglou{at}cs.cmu.edu