reCAPTCHA – Spam Protection with a Purpose

Spam protection on the web is a big deal.  Especially for independent publishers who don’t have much time to to focus on anything besides creating content for their site.

Many people think that spam only comes in the form of those annoying email messages you receive offering you pharmaceuticals or stocks, but there are many types of spam out there.  There are spam blogs, spam websites, search engine spam and more.  The specific type of spam that is most bothersome for bloggers and web publishers is comment spam.

Spammers have created virtual robots that crawl the web looking for forms to fill out with their messages and URLs in an attempt to boost their SEO.

A simple solution which I’ve used at TheFutureBuzz was to add a CATPCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) to the comment area.  It’s those squiggly letters and numbers that ensure you’re a person, not a spam robot who is leaving a legitimate comment.

I never really thought much of this other than as a way to stop spammers, until today when I stumbled-upon reCAPTCHA.  Some brilliant people at Carnegie Mellon University decided to put the CAPTCHA idea to good use:

“About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.”

The digital archiving of books is a massive project that several organizations are currently undertaking in an attempt to make them publicly available and searchable.  The problem is that the OCR (Optical Character Recognition) technology used to scan them isn’t perfect.  Here’s a visual of why OCR isn’t perfect:

reCAPTCHA solves this problem by using some of those 60 million CAPTCHAs completed daily to input words from old books that are unreadable by OCR:

Now when you leave comments here the CAPTCHA isn’t just keeping out spam, you’re also helping to digitize books for research purposes and for future generations to enjoy.

They support most blog platforms as well –  get it here.