top | item 8694378

(no title)

jackalope | 11 years ago

I always assumed Google's use of reCAPTCHA was to augment the OCR used to digitize Google Books, particularly in results the software couldn't confidently match to a word. Is this true? It's interesting that it's still the fallback for the new method.

discuss

order

Argorak|11 years ago

That was the original goal of the project.

http://en.wikipedia.org/wiki/ReCAPTCHA

"By presenting two words it both protects websites from bots attempting to access restricted areas[2] and helps digitize the text of books."

For some time, you could pass a reCAPTCHA test by just entering the more distorted word correctly.

willlma|11 years ago

This should be the top thread. I find the whole topic of crowdsourcing to compensate for the inadequacies of computer vision (and other inadequacies) fascinating. OCR was the first problem. We've been helping Google Maps identify house addresses for a while now with reCaptcha, and with this announcement it looks like Google is finally tackling the problem of image association. Computers suck at determining which pictures contain birds. By making users tag all of the images on the web, they're making image search much more powerful and will hopefully improve the entire field of computer vision.

When I tell my future robot to go get my coffee mug, I don't want it coming back with the PS5 controller.

verroq|11 years ago

I only ever enter the distorted one, works every time.

jrochkind1|11 years ago

That was the original idea behind reCAPTCHA (which originated outside of Google, acquired in 2009), but my understanding is that they long ago ran out of actual text that needed human OCR'ing, and/or found other reasons that approach no longer was helpful.

The "help OCR while also spam protecting" thing isn't currently mentioned on Google's recaptcha product page.

drzaiusapelord|11 years ago

For the past few years the recaptchas I've seen were illegible text next to easy to read text. I think its obvious that they've run out of the low hanging fruit and now just have the worst of the worst as placeholders. The move to house numbers just proves that they're kinda running out of badly OCR'd text.

This move isn't too surprising. OCR based captchas have always been a hack and the "best" captchas are like having the best collection of duct tape and WD40. At a certain point you need to stop doing half-assed repairs and remodel.

_lce0|11 years ago

they also used it to decode street number addresses, for street view