AI Model Fundamentally Cracks Captchas, Scientists Say

[+] indubitable|8 years ago|reply

Seems like natural language processing would be an interesting direction for captchas.

- A man is running. A dog is behind him barking and growling. What does the man think might happen?

- A man goes up the stairs to the roof. He walks to the very edge of the building. He takes one more step. What is the man trying to do?

The correct answer should be pretty easy to parse out. And I'd expect a better success rate for humans than some of the captchas today that increasingly are looking more like magic eye puzzles than character recognition. But of course the big question is generation. Can these sort of implication based stories be generated in a way such that the final text can not trivially be reversed to the answer (without even considering the 'meaning' of the question)? And for that matter can these even be realistically generated in mass?

[+] dmead|8 years ago|reply

You're in a desert walking along the sand when all of a sudden you look down and see a tortise. You reach down and flip the tortise on it's back. The tortise lays on it's back, it's belly baking in the hot sun but you're not helping. Why is that leon?

[+] sixhobbits|8 years ago|reply

People always come at this from an angle of "what can I do that computers can't?". You need to take into account the incredible diversity of people who use the internet, and what they can and can't do. There's already a viral article written by an old lady who can't pass the current captchas. Add to this, people who don't speak english, or don't speak it well; people who battle to read and comprehend text in any language; people who battle with logical reasoning; etc, etc, etc. The lowest common denominator for a task that be easily solved by any human is pretty low.

[+] RobertoG|8 years ago|reply

It's a good idea, but you would need different languages versions.

[+] belorn|8 years ago|reply

When writing such captcha questions for a forum, I generally use google as a validation to see that google can't answer the question in the top listed links. This allow me to easily adjust questions to the point where natural language processing should not be able to answer the question but a human person would.

[+] Hyperbolic|8 years ago|reply

Yep, Question-Answering Semantic Role labeling is an interesting research project around crowdsourcing NLP datasets. https://dada.cs.washington.edu/qasrl/

[+] olegkikin|8 years ago|reply

> A man is running. A dog is behind him barking and growling. What does the man think might happen?

So what's the correct answer here?

* That's a mean dog

* I hope it's on a leash

* I hope it's not going to start running after me

* I hope I don't get bitten

* Where can I hide?

* OMG, I will get rabies!

[+] DalasNoin|8 years ago|reply

You couldn't autogenerate them easily. If you don't have a unique captcha people could store answers

[+] taesis|8 years ago|reply

This [1] is the article they're citing. Note that a cursory search turns up similar claims from back in 2013; it might be worth waiting for someone with more experience and less bias to express their opinions before dumping your captcha-related stocks.

[1]: http://science.sciencemag.org/content/early/2017/10/26/scien...

[+] thisisit|8 years ago|reply

> captcha-related stocks.

Are there companies relying only only selling captcha for revenues?

[+] vonnik|8 years ago|reply

Is this the same old news from Vicarious? They announced this four years ago and raised about a $100M since then...

http://www.slate.com/blogs/future_tense/2013/10/28/captcha_c...

I thought the world moved on.

[+] reilly3000|8 years ago|reply

Since when was captcha not broken? Sites like http://www.deathbycaptcha.com/user/order have been around for ages. Yes, a mere $6.95 gets you 5000 captchas solved by OCR and humans in an avg of 6 seconds. Imagine that job.

Sure, AI can break captcha, but it can be done at scale for far less than an AI research and GPU rig costs.

Google's approach to bot recognition is training their own bots incidentally, so even an adversarial network attempting to bypass it would give it a ton of training along the way to breaking in.

[+] notatoad|8 years ago|reply

>Imagine that job.

I don't believe it's a job. Isn't this the thing where captchas on target sites are simply mirrored on other sites like sketchy filehosts? Real human users are solving captchas to access some content hosted by this service, and the solution they enter is passed through to the target site.

[+] _pdp_|8 years ago|reply

The Google cloud vision API will do this for ~$3 but you will need to automate it yourself, which you might need to do anyway with other services.

[+] partycoder|8 years ago|reply

Vicarious demoed cracking captchas at least 3 years ago.

Dileep George, cofounder of Vicarious, is the former Numenta CTO, and claimed to use probabilistic graphical models as a basis for their tech.

https://www.youtube.com/watch?v=-H185jPf-7o

[+] habitue|8 years ago|reply

I don't see how captchas are "fundamentally cracked" if they only claim a success rate at best around 2/3rds. Nor do they give an explanation for what they mean by fundamentally cracked.

[+] ComputerGuru|8 years ago|reply

Before you can say that a 66% success rate isn't good enough, you need to compare it to the human success rate. I barely get 2/3 myself.

[+] sobellian|8 years ago|reply

A captcha is cracked if it becomes economical to try to pass it over and over again. If you have a script that succeeds in spamming a forum 2/3 times it tries, you've got a successful spamming system.

What they mean by fundamentally cracked is that this method seems to be more robust against minor variations of spacing, font, etc. than CNN-based models.

[+] willchang|8 years ago|reply

Captchas are useless for their intended purpose if a bot can get it right better than every other time.

[+] taneq|8 years ago|reply

10 points to the first person to hack up a CAPTCHA using Winograd schemas.

[+] reacweb|8 years ago|reply

One of the fundamental problem with captchas is that writing a bot that defeat captchas is a very interesting exercise for teaching AI.

[+] gruez|8 years ago|reply

Is there more to this than "text captchas can be broken by deep learning"?

[+] Sir_Cmpwn|8 years ago|reply

What's the human pass rate for captchas? I bet I've personally failed at least 20% of the captchas I've solved in my lifetime.

[+] trophycase|8 years ago|reply

At a certain point it will be impossible to create a working captcha. Are we basically engineering a Turing Test?

[+] taneq|8 years ago|reply

CAPTCHA is literally meant to be an automated Turing test.

It's right there in the name: "Completely Automated Public Turing test to tell Computers and Humans Apart"

[+] nabla9|8 years ago|reply

CAPTCHA is Turing test with role reversal.

Computers try to figure out who is human and who is not. In Turing test humans try to who is human and who is not.

[+] mitchty|8 years ago|reply

Some of the captchas I get lately have honestly made me think I'm probably not a human as far too often I can't make heads or tails of the letters.

At a certain point I just give up and refuse to use the worst sites that use this junk.

[+] _pdp_|8 years ago|reply

Many types of CAPTCHA systems can be defeated with machine learning models and OCR. Google provides its own called Google Vision API. Here is a brief example how this is done in practice: https://blog.websecurify.com/2017/10/cracking-captchas.html

Perhaps this is an old news as this technique has been out for a while but I find that it is still relevant in the many cases I have encountered.

Furthermore, in my experience, I attribute Google's failure to improve reCAPTCHA's "I am not a robot" visual appeal as one of the key factors why many organisations are simply not using it.

[+] briga|8 years ago|reply

I think rather than being broken, captcha models are just going to be made more complex. Maybe they'll start asking you to write a poem or play a mini problem solving game.

[+] taeric|8 years ago|reply

I'd expect adversarial images to take off in captcha space. Don't try and avoid the models, exploit them.

[+] wheresmyusern|8 years ago|reply

a lot of services use facebook to verify that someone is a human. there should be a service that exists only to manage peoples identities online. sign up, provide some id, an address and last four of your social. later, maybe a letter is sent to the address and returned with a verification code. then, every other service on the internet could use that service to prevent bots, spam and other things.

[+] pinum|8 years ago|reply

I can foresee absolutely no potential problems with this plan...

[+] sitepodmatt|8 years ago|reply

Centralized identification services, thinking of some names, how about Experian or Equifax

[+] kwhitefoot|8 years ago|reply

Here in Norway we use something like that for access to banks, tax, pensions, social services, etc. All of these services allow you to log in with what is called BankID. You apply for BankID and supply an ID like your passport then all the other banks and institutions accept that. It uses a two factor scheme with SMS, code cards, apps in SIM cards, etc.

But of course this isn't available to be used by some random kitten video trading site.

Also why would I want to give up my real identity to a lot of the sites that use a captcha?

[+] lsseckman|8 years ago|reply

Are you aware of the book The Circle?

[+] HealthGoth|8 years ago|reply

[deleted]

85 comments