Wow, that is very surprising. Is it that the web development industry hurting that much for good programmers, or are just the wrong people being hired?
Completely OT: I find it interesting that this post and several other HN posts this week are hosted on Google Plus. I definitely would not have predicted that G+ would encroach on the LiveJournal/Tumblr space.
On a similar tangent to your OT post: we're getting to the point where seeing (plus.google.com) would be useful, since it conveys quite a different meaning to me from (google.com).
If anyone ever wondered what the phrase "cargo cult science" referred to, this is a prime example. They're going through all the motions, but sadly their understanding of the universe is gratuitously flawed.
+1 for cargo cults: http://en.wikipedia.org/wiki/Cargo_cult. Its a great idea to keep in mind for a creator/designer/programmer. People/users/everyone all too often intuit through imitation.
On a forum I run (phpbb3) I eliminated 99% of the spam by adding 1 field that says "enter 42 here to prove you are human". No image, no hidden field, nothing.
We still get the occasional spammer but the real problem was our phpbb3 board showing up in the automated spam programs. As soon as we were slightly different than the default install, nearly all the spam stopped.
The interesting thing was that even the built-in captcha didn't stop the spam--it was worth cracking since everyone uses it.
When you design solution, you have to decide if you're protecting against targeted or not targeted attack. It's not all just "spam".
If your concern are only dumb, fully-automated bots not targeting your site specifically (which is true for the bottom 99.5% of the web) then you don't need CAPTCHA.
2 and 3 are great for non-targeted attack. 1 is a very weak protection against targeted attack and it's likely an overkill unnecessarily burdening users.
2 and 3 are decent, as long as you don't have commenters trying to discuss something spammy (depends on the site community). #1 only works because your site isn't big enough for anybody to specifically target, though. I'm not saying it's bad (so long as it works, it's by definition at least "good enough"), just don't expect it to scale.
We wanted to do something similar on a site I was involved with.
Unfortunately it wasn't allowed because the site owner pointed out that the market the site was aimed at had a reasonable number of people with connotative difficulties - ie, they struggled to follow multi-step instructions.
(Yes, this does mean that computers are able to solve a problem that is supposed to identify a human much better than some humans.)
I have a few sites only getting about 1k visitors a month and #1 does reduce the spam a bit, but I still get 2-3 submissions a day, and I would not say these are targeted at all, just mass spam bots.
As always, one of the most interesting part of truly great CAPTCHA systems is that they are advancing the state of the art in image recognition. But on the other hand we still have scams like this, and no real solutions.
A few years ago, or so i think, people went all crazy talking about a replacement for captcha's: Show a range of images, and make the user pick the image described by a block of text.
Because the math doesn't work. Most "next-gen" captcha fundamentally fail (by orders of magnitude) one of the many pillars that make captchas scale....
1. Is it trivial for a human to answer correctly? This affects growth.
2. Can humans do it quickly? This affects growth.
3. How is the random guess-rate? This better be abysmal.
4. How good is the “opposing” technology?
5. How is the guess rate of a sophisticated attacker, using said technology?
6. How much human input is required to create your captcha? You better be asymptotically better than human-solving the captcha.
7. What are the cultural and accessibility issues?
Any CAPTCHA scheme that can be solved by enumeration of all possible answers is a failure, because there are cost effective ways to hit a CAPTCHA over and over again, with cheap humans, and build the enumeration table. This is where the "pick the image with a cute thing" in it scheme falls down. In this case, once the enumeration of description -> image(s) is determined, you lose.
Any scheme that involves humans some how creating tags or labeling images or writing text will generally be enumerable as well, because they can trivially out-manpower you.
Also, many CAPTCHA schemes use a model of spammer in which the spammer isn't permitted to be clever. If there is a pattern, in the real world the spammer is "allowed" to exploit it. There are 2^64 different ways to add two 32-bit numbers to each other, but that doesn't mean that you can beat a spammer just by asking the user to do a simple addition, because when I say "enumerate" I mean it more in the computer science sense, not the literal sense. They can and will create something that parses the problem and does it, so for instance for my stupid "add two random 32-bit numbers" example the CAPTCHA is actually easier for a computer than a human.
CAPTCHAs are hard and getting steadily harder... at least, if you require them to work. Security theater is easy.
If you only have a limited collection of images to pick from, then bots could get decent scores by picking at random. A better approach might be to ask users to pick matching images (ie. 2^N possible choices).
What would the system use for its corpus of images and text descriptions? The corpus would have to be significantly larger than what any given attacker could manually identify. Once an attacker has manually identified an image+text combination, they could store the combination and use it to solve any future CAPTCHAs with the same image+text.
On the subject of terrible captcha systems. I found the following gem while looking for OSS games for linux:
"You are born into WHAT? (answer is one english word)* [1]
It is not entirely clear to me what the expected answer is. A google search for "you are born into" does not return any answer that is clearly correct. If I had to guess I would go with "sin" but I am hoping that nobody would be so ignorant as to design a captcha system that assumes a certain cultural/religious background.
A slightly less clueless (but still clueless) approach to CAPTCHA design is to 1) make the CAPTCHA case-sensitive, 2) use letters for which the lower-case representation is very similar to upper-case, and/or use both zero and the letter O, 1 and the letter l, and so on, 3) use an image munging algorithm that makes it next to impossible to disambiguate the cases in 2).
The problem with captchas is they have to be readable to humans.
Sure, a captcha of "lI0Ol1o" would would probably be unreadable to a computer ... but it would be to a human too.
We're quickly approaching the point that image recognition is getting as good at solving image captchas as humans are, and when we do, we'll need to find some other way to do it.
What I think is cool are the captchas that make fake words that actually look like they could be real words (as opposed to a random string of text). Makes it easier for a human to read and figure out, but no easier for a bot. I dont know how they do that.
I dislike like long, nonsensical captchas that confuse people, it's totally annoying. A few years ago i used a 5 digit captcha, but in the background i added faded small letters in various angles.
I can't believe Google is criticizing how Sony does CAPTCHAs when I've been complaining for years about how difficult Google's are to read. But as to their point, based on Sony's recent security issues, it doesn't sound like Sony has a very good IT department.
[+] [-] Slackwise|14 years ago|reply
An example would be https://sso.state.mi.us/som/dch/enroll/reg_page1.jsp (You can enter any fake name/email, this is only step one of the registration script. The next page has the captch in question.)
The captcha is plaintext, right on the page. The data from the captcha isn't even sent to the server, it is processed locally via JavaScript.
So, the bots don't even have to do anything, but humans have to input a meaningless number...
[+] [-] codabrink|14 years ago|reply
[+] [-] sthatipamala|14 years ago|reply
[+] [-] carbocation|14 years ago|reply
[+] [-] smackfu|14 years ago|reply
[+] [-] Garbage|14 years ago|reply
[+] [-] ignifero|14 years ago|reply
[+] [-] johnx123-up|14 years ago|reply
[+] [-] yid|14 years ago|reply
[+] [-] drenei|14 years ago|reply
[+] [-] RyanMcGreal|14 years ago|reply
1. Simple mathematical question, e.g. "What do you get if you add five and three?" Answer is processed on the server.
2. Hidden form field that is supposed to remain blank.
3. Blacklist of common spam words.
[+] [-] __david__|14 years ago|reply
We still get the occasional spammer but the real problem was our phpbb3 board showing up in the automated spam programs. As soon as we were slightly different than the default install, nearly all the spam stopped.
The interesting thing was that even the built-in captcha didn't stop the spam--it was worth cracking since everyone uses it.
[+] [-] rufibarbatus|14 years ago|reply
Best CAPTCHA ever: http://random.irb.hr/signup.php
[+] [-] pornel|14 years ago|reply
If your concern are only dumb, fully-automated bots not targeting your site specifically (which is true for the bottom 99.5% of the web) then you don't need CAPTCHA.
2 and 3 are great for non-targeted attack. 1 is a very weak protection against targeted attack and it's likely an overkill unnecessarily burdening users.
[+] [-] patrickyeon|14 years ago|reply
[+] [-] nl|14 years ago|reply
Unfortunately it wasn't allowed because the site owner pointed out that the market the site was aimed at had a reasonable number of people with connotative difficulties - ie, they struggled to follow multi-step instructions.
(Yes, this does mean that computers are able to solve a problem that is supposed to identify a human much better than some humans.)
[+] [-] panacea|14 years ago|reply
Even my pre-school self could solve the Sesame Street "one of these things is not like the other".
There are so many sets with an odd-one-out that would only be easily determinable by a human over a computer.
[+] [-] alexitosrv|14 years ago|reply
http://www.wolframalpha.com/input/?i=What+do+you+get+if+you+...
[+] [-] CoryMathews|14 years ago|reply
[+] [-] alexitosrv|14 years ago|reply
As always, one of the most interesting part of truly great CAPTCHA systems is that they are advancing the state of the art in image recognition. But on the other hand we still have scams like this, and no real solutions.
[+] [-] ghurlman|14 years ago|reply
Instead, it would seem they're taking the "we'll get hacked anyway, so let's not waste our time" approach.
[+] [-] dennisgorelik|14 years ago|reply
It just indicates pathetic state of Sony Security development team - something that cannot be changed overnight.
[+] [-] adamtulinius|14 years ago|reply
How come nobody adopted that approach?
[+] [-] lbrandy|14 years ago|reply
1. Is it trivial for a human to answer correctly? This affects growth.
2. Can humans do it quickly? This affects growth.
3. How is the random guess-rate? This better be abysmal.
4. How good is the “opposing” technology?
5. How is the guess rate of a sophisticated attacker, using said technology?
6. How much human input is required to create your captcha? You better be asymptotically better than human-solving the captcha.
7. What are the cultural and accessibility issues?
[+] [-] jerf|14 years ago|reply
Any CAPTCHA scheme that can be solved by enumeration of all possible answers is a failure, because there are cost effective ways to hit a CAPTCHA over and over again, with cheap humans, and build the enumeration table. This is where the "pick the image with a cute thing" in it scheme falls down. In this case, once the enumeration of description -> image(s) is determined, you lose.
Any scheme that involves humans some how creating tags or labeling images or writing text will generally be enumerable as well, because they can trivially out-manpower you.
Also, many CAPTCHA schemes use a model of spammer in which the spammer isn't permitted to be clever. If there is a pattern, in the real world the spammer is "allowed" to exploit it. There are 2^64 different ways to add two 32-bit numbers to each other, but that doesn't mean that you can beat a spammer just by asking the user to do a simple addition, because when I say "enumerate" I mean it more in the computer science sense, not the literal sense. They can and will create something that parses the problem and does it, so for instance for my stupid "add two random 32-bit numbers" example the CAPTCHA is actually easier for a computer than a human.
CAPTCHAs are hard and getting steadily harder... at least, if you require them to work. Security theater is easy.
[+] [-] a3_nm|14 years ago|reply
[+] [-] DrewHintz|14 years ago|reply
[+] [-] ams6110|14 years ago|reply
If the captcha is ANYTHING other than immediately obvious, a signficant number users will not be able to pass it.
[+] [-] desaiguddu|14 years ago|reply
Here is my CAPTCHA research paper:
http://news.ycombinator.org/item?id=2754436
http://www.slideshare.net/desaiguddu/drag-and-drop-captcha-a...
[+] [-] mixmastamyk|14 years ago|reply
[+] [-] dfc|14 years ago|reply
"You are born into WHAT? (answer is one english word)* [1]
It is not entirely clear to me what the expected answer is. A google search for "you are born into" does not return any answer that is clearly correct. If I had to guess I would go with "sin" but I am hoping that nobody would be so ignorant as to design a captcha system that assumes a certain cultural/religious background.
[1] http://garden.sourceforge.net/drupal/?q=image/tid/3
[+] [-] snorkel|14 years ago|reply
[+] [-] Turing_Machine|14 years ago|reply
[+] [-] Xk|14 years ago|reply
Sure, a captcha of "lI0Ol1o" would would probably be unreadable to a computer ... but it would be to a human too.
We're quickly approaching the point that image recognition is getting as good at solving image captchas as humans are, and when we do, we'll need to find some other way to do it.
[+] [-] nl|14 years ago|reply
A computer can do statistical sampling of many CAPTCHAs generated by the same website, and then try to reverse-engineer the image munging algorithm.
Humans, OTOH will probably give up after 2 tries and already struggle to get |O0Il1l right.
[+] [-] hammock|14 years ago|reply
[+] [-] ignifero|14 years ago|reply
[+] [-] Kwpolska|14 years ago|reply
[+] [-] rlf|14 years ago|reply
[+] [-] kijinbear|14 years ago|reply