top | item 2755716

How not to design a CAPTCHA

310 points| DrewHintz | 14 years ago |plus.google.com | reply

88 comments

order
[+] Slackwise|14 years ago|reply
I work in medical IT. You'd be surprised how many government sites do similar.

An example would be https://sso.state.mi.us/som/dch/enroll/reg_page1.jsp (You can enter any fake name/email, this is only step one of the registration script. The next page has the captch in question.)

The captcha is plaintext, right on the page. The data from the captcha isn't even sent to the server, it is processed locally via JavaScript.

So, the bots don't even have to do anything, but humans have to input a meaningless number...

    <input type="text" name="inputNumber" class="entry-field" size="5" tabindex="3">

    <!-- ... -->

    document.write('<div id="layerNum" class="verifyNumber" align="center">');
    document.write('<b>'+str+'</b>');
    document.write('<img src="generateGIF.jsp?number='+str+'">');
    document.write('</div>');
    document.write('<input size="5" type="hidden" name="rdNumber"  value="'+str+'">');

    <!-- ... -->

    <input type="submit" value="Continue" name="submit" onclick="return Valid();">

    <!-- ... -->

    function Valid(){
    // ...
            if(chkRandomNumber()){
              return true;
            }else{
              return false;
            }
    // ...
    }

    function chkRandomNumber(){
      str1=document.all.rdNumber.value;
      str2=document.all.inputNumber.value;
      if(str1!=str2){
        alert("Please check and type the number as shown in the box");
        return false;
      }else{
        return true;
      }
    }
[+] codabrink|14 years ago|reply
Wow, that is very surprising. Is it that the web development industry hurting that much for good programmers, or are just the wrong people being hired?
[+] sthatipamala|14 years ago|reply
Completely OT: I find it interesting that this post and several other HN posts this week are hosted on Google Plus. I definitely would not have predicted that G+ would encroach on the LiveJournal/Tumblr space.
[+] carbocation|14 years ago|reply
On a similar tangent to your OT post: we're getting to the point where seeing (plus.google.com) would be useful, since it conveys quite a different meaning to me from (google.com).
[+] smackfu|14 years ago|reply
And I am impressed by how bad their URL structure is.
[+] Garbage|14 years ago|reply
Its kind of problem for me, as G+ is blocked on my office network. ;) :(
[+] ignifero|14 years ago|reply
Yep, for many people g+ will be the long form twitter. Google should throw some calendar / archive and search in there
[+] johnx123-up|14 years ago|reply
OT: (FUD) All HN users will move to G+?
[+] yid|14 years ago|reply
If anyone ever wondered what the phrase "cargo cult science" referred to, this is a prime example. They're going through all the motions, but sadly their understanding of the universe is gratuitously flawed.
[+] RyanMcGreal|14 years ago|reply
On a site I administer that used to be deluged in spam, I managed to eliminate it with a three-pass filter:

1. Simple mathematical question, e.g. "What do you get if you add five and three?" Answer is processed on the server.

2. Hidden form field that is supposed to remain blank.

3. Blacklist of common spam words.

[+] __david__|14 years ago|reply
On a forum I run (phpbb3) I eliminated 99% of the spam by adding 1 field that says "enter 42 here to prove you are human". No image, no hidden field, nothing.

We still get the occasional spammer but the real problem was our phpbb3 board showing up in the automated spam programs. As soon as we were slightly different than the default install, nearly all the spam stopped.

The interesting thing was that even the built-in captcha didn't stop the spam--it was worth cracking since everyone uses it.

[+] pornel|14 years ago|reply
When you design solution, you have to decide if you're protecting against targeted or not targeted attack. It's not all just "spam".

If your concern are only dumb, fully-automated bots not targeting your site specifically (which is true for the bottom 99.5% of the web) then you don't need CAPTCHA.

2 and 3 are great for non-targeted attack. 1 is a very weak protection against targeted attack and it's likely an overkill unnecessarily burdening users.

[+] patrickyeon|14 years ago|reply
2 and 3 are decent, as long as you don't have commenters trying to discuss something spammy (depends on the site community). #1 only works because your site isn't big enough for anybody to specifically target, though. I'm not saying it's bad (so long as it works, it's by definition at least "good enough"), just don't expect it to scale.
[+] nl|14 years ago|reply
We wanted to do something similar on a site I was involved with.

Unfortunately it wasn't allowed because the site owner pointed out that the market the site was aimed at had a reasonable number of people with connotative difficulties - ie, they struggled to follow multi-step instructions.

(Yes, this does mean that computers are able to solve a problem that is supposed to identify a human much better than some humans.)

[+] panacea|14 years ago|reply
I've often thought captchas were doing it wrong.

Even my pre-school self could solve the Sesame Street "one of these things is not like the other".

There are so many sets with an odd-one-out that would only be easily determinable by a human over a computer.

[+] CoryMathews|14 years ago|reply
I have a few sites only getting about 1k visitors a month and #1 does reduce the spam a bit, but I still get 2-3 submissions a day, and I would not say these are targeted at all, just mass spam bots.
[+] alexitosrv|14 years ago|reply
If you are in this, maybe you could find interesting this review of a paper from googlers to approach a CAPTCHA design, in which humans are asked to select the right image rotation: http://glinden.blogspot.com/2009/05/exploiting-spammers-to-m...

As always, one of the most interesting part of truly great CAPTCHA systems is that they are advancing the state of the art in image recognition. But on the other hand we still have scams like this, and no real solutions.

[+] ghurlman|14 years ago|reply
Sony... some part of me had really hoped that they would overreact to the hacking movement against them, and lock themselves down like Ft. Knox.

Instead, it would seem they're taking the "we'll get hacked anyway, so let's not waste our time" approach.

[+] dennisgorelik|14 years ago|reply
The Sony's CAPTCHA we are discussing here was likely written years ago (before Sony security vulnerability scandal).

It just indicates pathetic state of Sony Security development team - something that cannot be changed overnight.

[+] adamtulinius|14 years ago|reply
A few years ago, or so i think, people went all crazy talking about a replacement for captcha's: Show a range of images, and make the user pick the image described by a block of text.

How come nobody adopted that approach?

[+] lbrandy|14 years ago|reply
Because the math doesn't work. Most "next-gen" captcha fundamentally fail (by orders of magnitude) one of the many pillars that make captchas scale....

1. Is it trivial for a human to answer correctly? This affects growth.

2. Can humans do it quickly? This affects growth.

3. How is the random guess-rate? This better be abysmal.

4. How good is the “opposing” technology?

5. How is the guess rate of a sophisticated attacker, using said technology?

6. How much human input is required to create your captcha? You better be asymptotically better than human-solving the captcha.

7. What are the cultural and accessibility issues?

[+] jerf|14 years ago|reply
"Anyone can invent a security system that he himself cannot break." - http://www.schneier.com/blog/archives/2011/04/schneiers_law....

Any CAPTCHA scheme that can be solved by enumeration of all possible answers is a failure, because there are cost effective ways to hit a CAPTCHA over and over again, with cheap humans, and build the enumeration table. This is where the "pick the image with a cute thing" in it scheme falls down. In this case, once the enumeration of description -> image(s) is determined, you lose.

Any scheme that involves humans some how creating tags or labeling images or writing text will generally be enumerable as well, because they can trivially out-manpower you.

Also, many CAPTCHA schemes use a model of spammer in which the spammer isn't permitted to be clever. If there is a pattern, in the real world the spammer is "allowed" to exploit it. There are 2^64 different ways to add two 32-bit numbers to each other, but that doesn't mean that you can beat a spammer just by asking the user to do a simple addition, because when I say "enumerate" I mean it more in the computer science sense, not the literal sense. They can and will create something that parses the problem and does it, so for instance for my stupid "add two random 32-bit numbers" example the CAPTCHA is actually easier for a computer than a human.

CAPTCHAs are hard and getting steadily harder... at least, if you require them to work. Security theater is easy.

[+] a3_nm|14 years ago|reply
If you only have a limited collection of images to pick from, then bots could get decent scores by picking at random. A better approach might be to ask users to pick matching images (ie. 2^N possible choices).
[+] DrewHintz|14 years ago|reply
What would the system use for its corpus of images and text descriptions? The corpus would have to be significantly larger than what any given attacker could manually identify. Once an attacker has manually identified an image+text combination, they could store the combination and use it to solve any future CAPTCHAs with the same image+text.
[+] ams6110|14 years ago|reply
Mainly because, to quote Spolsky, Users don't read instructions.

If the captcha is ANYTHING other than immediately obvious, a signficant number users will not be able to pass it.

[+] mixmastamyk|14 years ago|reply
Jesus, rootkits, psn, and now plaintext captchas ... the dev/it clowns at sony need to be fired en masse.
[+] dfc|14 years ago|reply
On the subject of terrible captcha systems. I found the following gem while looking for OSS games for linux:

"You are born into WHAT? (answer is one english word)* [1]

It is not entirely clear to me what the expected answer is. A google search for "you are born into" does not return any answer that is clearly correct. If I had to guess I would go with "sin" but I am hoping that nobody would be so ignorant as to design a captcha system that assumes a certain cultural/religious background.

[1] http://garden.sourceforge.net/drupal/?q=image/tid/3

[+] snorkel|14 years ago|reply
What about just asking the user "Why would a benevolent God allow evil to exist?" and then the server checks if the answer mentions "freewill"
[+] Turing_Machine|14 years ago|reply
A slightly less clueless (but still clueless) approach to CAPTCHA design is to 1) make the CAPTCHA case-sensitive, 2) use letters for which the lower-case representation is very similar to upper-case, and/or use both zero and the letter O, 1 and the letter l, and so on, 3) use an image munging algorithm that makes it next to impossible to disambiguate the cases in 2).
[+] Xk|14 years ago|reply
The problem with captchas is they have to be readable to humans.

Sure, a captcha of "lI0Ol1o" would would probably be unreadable to a computer ... but it would be to a human too.

We're quickly approaching the point that image recognition is getting as good at solving image captchas as humans are, and when we do, we'll need to find some other way to do it.

[+] nl|14 years ago|reply
That's actually probably going to be easier for a computer to solve than a human.

A computer can do statistical sampling of many CAPTCHAs generated by the same website, and then try to reverse-engineer the image munging algorithm.

Humans, OTOH will probably give up after 2 tries and already struggle to get |O0Il1l right.

[+] hammock|14 years ago|reply
What I think is cool are the captchas that make fake words that actually look like they could be real words (as opposed to a random string of text). Makes it easier for a human to read and figure out, but no easier for a bot. I dont know how they do that.
[+] ignifero|14 years ago|reply
I dislike like long, nonsensical captchas that confuse people, it's totally annoying. A few years ago i used a 5 digit captcha, but in the background i added faded small letters in various angles.
[+] Kwpolska|14 years ago|reply
DON'T use a bloody CAPTCHA.
[+] rlf|14 years ago|reply
I can't believe Google is criticizing how Sony does CAPTCHAs when I've been complaining for years about how difficult Google's are to read. But as to their point, based on Sony's recent security issues, it doesn't sound like Sony has a very good IT department.
[+] kijinbear|14 years ago|reply
It's not Google criticizing Sony, it's Andrew Hintz posting on his Google+ page.