One of the things I added to this blog when I moved from my own software to WordPress was the red and yellow box in the comments section, which defends this blog against comment spam by asking commenters to decipher a couple of words. Such challenge-response systems are called CAPTCHAs (a tortured and unmellifluous acroynm of “completely automated public Turing test to tell computers and humans apart”). What really caught my imagination about the CAPTCHA I’m using, called reCAPTCHA, is that it uses words from books scanned by the Internet Archive/Open Content Alliance. Thus at the same time commenters solve the word problems they are effectively serving as human OCR machines.
To date, about two million words have been deciphered using reCAPTCHA (see the article in Technology Review lauding reCAPTCHA’s mastermind, Luis von Ahn), which is a great start but by my calculation (100,000 words per average book) only the equivalent of about 20 books. Of course, it’s really much more than that because the words in reCAPTCHA are the hardest ones to decipher by machine and are sprinkled among thousands of books.
Indeed, that is the true genius of reCAPTCHA—it “tells computers and humans apart” by first using OCR software to find words computers can’t decipher, then feeds those words to humans, who can decipher the words (proving themselves human). Therefore a spammer running OCR software (as many of them do to decipher lesser CAPTCHAs), will have great difficulty cracking it. If you would like an in-depth lesson about how reCAPTCHA (and CAPTCHAs in general) works, take a listen to Steve Gibson’s podcast on the subject.
The brilliance of reCAPTCHA and its simultaneous assistance to the digital commons leads one to ponder: What other aspects of digitization, cataloging, and research could be aided by giving a large, distributed group of humans the bits that computers have great difficulty with?
And imagine the power of this system if all 60 million CAPTCHAs answered daily were reCAPTCHAs instead. Why not convert your blog or login system to reCAPTCHA today?