An Empirical Study and Evaluation of Modern CAPTCHAs

[+] caymanjim|2 years ago|reply

Google CAPTCHAs were designed and deployed as a mechanism to train AIs. That's why they are the way they are. Any security theater surrounding them is entirely incidental. So it's no surprise that the AIs are now good at solving them. We've trained them for years.

[+] noduerme|2 years ago|reply

All true, except: While these are considered just an excruciating security pain for users, they do serve a non-theatrical purpose in many cases of throttling the speed of brute force attacks (or at least costing your opponent money).

[+] rezonant|2 years ago|reply

This doesn't make sense. reCAPTCHA certainly does what it says on the tin. But the way it does it has almost nothing to do with the challenge the human sees. It's all behavioral analytics, including leveraging Google's collected data to determine how likely a user is a bot before they even load the page.

I'm not denying reCAPTCHA is a source of training data for Google -- surely there's no particular reason that every single reCAPTCHA V2 challenge is about identifying traffic objects, and it's not like Google is building a self-driving AI or anything.

But that's the business model, not the core feature.

And, that training data isn't just given to the developers of captcha solving bots.

[+] wouldbecouldbe|2 years ago|reply

I always thought they used more timing & mouse movement instead of correct answer to verify if your a human.

[+] pyeri|2 years ago|reply

Once they get fully trained then how will websites ever distinguish between an intelligent bot and real human? At least now, they are outsourcing that filtering to services like cloudflare. But with this kind of training, how will even cloudflare distinguish between bot and the human?

[+] panny|2 years ago|reply

>So it's no surprise that the AIs are now good at solving them

Funnily enough, AI may be better at solving them than people. I've encountered many Google captchas which reject the correct answers, because you know... bots trained it to accept incorrect ones. Anyway, at least it's not stop signs anymore. It must have been truly embarrassing that Google was simultaneously selling "self driving" cars but at the same time demonstrating that stop sign recognition couldn't be done by robots.

[+] leobg|2 years ago|reply

I still find it funny that Google, with the advantage of having millions of Internet users train their AI like galley slaves for free, hasn’t yet been able to crack vision driven self driving. Tesla had no such advantage when training their FSD to recognize traffic lights, bicycles, motorcycles, etc.

[+] rezonant|2 years ago|reply

As best as I can tell this study explores many facets of how humans solve captchas. I couldn't find anything about AIs outperforming humans in the study. Can someone give me a section reference?

Solving reCAPTCHA v2/v3 requires more than just clicking the box and an image puzzle. If that was all it was we would be overrun by now.

Lots of folks commenting that the title's statement makes sense because CAPTCHAs are meant to train AIs. While this is broadly true, that's a nice side effect. The way modern CAPTCHAs like reCaptcha V2+ work, is they monitor behavioral analytics-- from things like your browsing history to how your mouse moves on the page. This is why most of the time, most people only need to click a box. I'm not sure there's a LMM out there that includes mouse movement as a modality.

The kinds of AIs that are designed to beat CAPTCHAs also don't have the data from Google et al to use to train, unless we're concerned Google is training it's own bots to bypass CAPTCHAs, I suppose it's not inconceivable?

[+] croemer|2 years ago|reply

Yeah, the study is really not about AI solving captchas but how humans solve them. Quite a clickbait title - but those do well on HN unfortunately.

[+] mherrmann|2 years ago|reply

It's in Table 3.

[+] anonzzzies|2 years ago|reply

I guess validating a payment card is going to be the next step to sign up for whatever. Don’t allow pre paid BINs and let’s go. Gonna be pretty miserable, however someone needs to find something as I currently would rather pay 0.01$ instead of solving a captcha. Especially the select all the bicycles; it’s a waste of life.

[+] nuz|2 years ago|reply

At this point the amount of friction added to all these things is pushing things towards just not doing them in the first place (buying less stuff, using social media less). Nature walks and paper books doesn't have captchas.

[+] mrtksn|2 years ago|reply

The next step is device attestation. IIRC Safari already does this, so you should not see captcha on places that support it.

Something that can work on any browser can be like this: Scan the QR code in your iPhone or Android device that supports attestation. Will ask you if you approve login, then will attest for you. If you turn out to be a bad actor, the website can ban this device - so no flooding with a single device.

[+] 2Gkashmiri|2 years ago|reply

look up indian UPI. "validating payment card" and all that snazzy bits are error prone, old, archaic and cost a fortune to businesses.

in upi system, you are presented with a QR code or you input your UPI ID, you click pay and it gets through.

if you are worried about "fraud protection", why rely on an intermediary like ebay or credit card company and instead should take up with your bank or the seller or courts.

[+] shwouchk|2 years ago|reply

Please. Last time I had to solve a captcha it was wasted 15 minutes (not exaggerating!) of my life, clicking on an endless stream of bikes, motorcycles, buses and stoplights. As punishment for using a vpn.

[+] joseda-hg|2 years ago|reply

I dread to think about that becoming the norm, I remember living in {Country} with 0 access to cards that would be accepted for anything international

[+] calderknight|2 years ago|reply

or just use Worldcoin

[+] gary_0|2 years ago|reply

Does HN ever require CAPTCHAs? It seems to do pretty well with its basic but battle-tested moderation/antispam tools, and rate-limiting that seems to repel all but the most concerted DDoS attacks. I don't think HN has any unreasonable restrictions on scraping or third-party clients, either. And it manages to serve 5M unique visitors a month and 10M views a day[0].

[0] https://news.ycombinator.com/item?id=33454140

[+] arp242|2 years ago|reply

HN is also not really a very attractive target. The only thing you can do is post spam, and that's pretty low-value in terms of actual monetary value to the abuser, and tools to deal with that have been around for decades as you say.

This is very different from many other sites where the potential to make a buck is much more pronounced and direct.

[+] nextaccountic|2 years ago|reply

It struggles whenever there's a story more popular than usual though

[+] shiomiru|2 years ago|reply

IIRC the registration page (only in some cases?) shows a reCAPTCHA.

[+] jamiek88|2 years ago|reply

On one machine! :)

[+] ShamelessC|2 years ago|reply

They go down somewhat frequently. I think it’s like four 9’s? I’m not sure why they insist on running just a few machines though. They have more than enough money and probably make up the difference by the advertising for YC that they get.

[+] bongobingo1|2 years ago|reply

I cant tell if the audience of HN are more likely to script something untoward against HN, be that DDOS or just "check out my product" spam, because its a bunch of hackers - or less likely to do it because (maybe) we like having nice things, or figure the audience is too in the know to fall for boring crypto spam.

[+] unknown|2 years ago|reply

[deleted]

[+] unknown|2 years ago|reply

[deleted]

[+] unknown|2 years ago|reply

[deleted]

[+] unknown|2 years ago|reply

[deleted]

[+] NotSammyHagar|2 years ago|reply

I find captchas extremely painful, because of ambiguity and not loading all the pictures. I wait for a minute and some never show. When they do load, so manyare pics of bicycles and motorcycles and cross walks. Are you supposed to click on the tiny piece that goes tojust past another tile or not? You can't refresh one that doesn't load, I think most of them start over if you refresh.

Like other people reported, if you ever use tor, it's very common for the captchas to just not load. They just kind of hang without showing the pictures. Regular websites generally just work fine on tor, it seems to be a captcha problem.

[+] xlbuttplug2|2 years ago|reply

> Are you supposed to click on the tiny piece that goes tojust past another tile or not?

I ask myself this every time.

[+] lapcat|2 years ago|reply

I predicted this 7 years ago: "How will the machines take over? When CAPTCHAs become so hard that only AI can solve them, humans will be completely locked out of the net." https://twitter.com/lapcatsoftware/status/771857826130034688

[+] armchairhacker|2 years ago|reply

I thought this was already happening ~7 years ago. The "what text is in this image captchas" got a lot less common a while ago, and I think this was partly the reason why.

[+] dang|2 years ago|reply

Submitted title was "AI bots are now outperforming humans in solving CAPTCHAs", which broke HN's title rule: "Please use the original title, unless it is misleading or linkbait; don't editorialize."

Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

[+] YeGoblynQueenne|2 years ago|reply

Misleadingly editorialised title. Actual title and abstract (which doesn't say anything about AIs "now" outperforming humans):

An Empirical Study & Evaluation of Modern CAPTCHAs

* For nearly two decades, CAPTCHAs have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAs have continued to improve. Meanwhile, CAPTCHAs have also evolved in terms of sophistication and diversity, becoming increasingly difficult to solve for both bots (machines) and humans. Given this long-standing and still-ongoing arms race, it is critical to investigate how long it takes legitimate users to solve modern CAPTCHAs, and how they are perceived by those users.* * In this work, we explore CAPTCHAs in the wild by evaluating users' solving performance and perceptions of unmodified currently-deployed CAPTCHAs. We obtain this data through manual inspection of popular websites and user studies in which 1,400 participants collectively solved 14,000 CAPTCHAs. Results show significant differences between the most popular types of CAPTCHAs: surprisingly, solving time and user perception are not always correlated. We performed a comparative study to investigate the effect of experimental context -- specifically the difference between solving CAPTCHAs directly versus solving them as part of a more natural task, such as account creation. Whilst there were several potential confounding factors, our results show that experimental context could have an impact on this task, and must be taken into account in future CAPTCHA studies. Finally, we investigate CAPTCHA-induced user task abandonment by analyzing participants who start and do not complete the task.*

@dang, could you please correct the title? Thanks.

[+] alexnewman|2 years ago|reply

All of these papers miss that captchas have multiple levels of difficultly. People who get an enterprise account or work closely with the captcha providers will find very different results. Many captcha providers now decide what captchas to send out, in hard mode based on what LLMs cannot solve

Captchas are purposely not made too hard as people like pex.com need to be able to bypass them for copyright enforcement. Note I’m biased as I was a founder of hcaptcha

[+] codedrivendev|2 years ago|reply

I think I prefer the recent CAPTCHAs (where you solve a puzzle by rotating an item, or finding the matching item). The older ones from years ago (deciphering mangled text and trying to work out if it is an `i`, `1` or `l` were more annoying)

[+] croemer|2 years ago|reply

Bot operators can already pay human captcha solvers as the paper mentions. So all this does is potentially replace those humans with AI, driving down prices for bot operators.

As prices for bot operators decrease, website operators will increase the challenge and drive up effort for the intended website audience (humans) who are solving captchas instead of paying bots.

In the end, the website operators will have to stop using captchas as the intended website audience will no longer be willing to solve harder captchas.

Website operators can use alternatives, like asking for micro-payments, high enough to discourage most bot operators.

[+] tarruda|2 years ago|reply

> Website operators can use alternatives, like asking for micro-payments

Similarly to how dApps work in ethereum-like blockchains?

[+] drexlspivey|2 years ago|reply

Micropayments is not possible when stripe/visa/paypal charge a 30 cents minimum fee

[+] tomschwiha|2 years ago|reply

We could simply reverse captchas now: if the captcha is solved its a roboter, otherwise its a human.

[+] bamboozled|2 years ago|reply

We can’t program a bot to fail ?

[+] barbazoo|2 years ago|reply

I wish captcha providers universally had to provide a way to shut down their use by bad actors. Here in Canada I get tons of scam texts pointing me to a fake banking or postal service website asking me to pay a fake bill. I want to ddos them with fake payment data but they’re all protected by hcaptcha.

[+] jbd0|2 years ago|reply

I have been locked out of websites for solving a captcha so quickly that it thought I was a bot. So we went from requiring humans to solve a puzzle that bots can't to now requiring that humans solve the puzzle slower than bots do.

[+] olliej|2 years ago|reply

It's really amazing when we still get those text ones and nowadays you can literally select the text in many of the images and copy/paste into the input field.

[+] mherrmann|2 years ago|reply

The relevant data for the claim of the headline is in Table 3. On all the tasks with enough data, bots were both faster and more accurate than humans.

[+] renonce|2 years ago|reply

Yeah, the claim of the headline comes from the first sentence in Section 5.5. I think either the title should match the paper's title or that should be pointed out as part of the submission - not sure how HN's title guidelines work.

[+] mdale|2 years ago|reply

I think captchas disappear next year or so. Already was soft human determination.

[+] dasrecht|2 years ago|reply

So we now proof that we're human by failing those tests?

[+] JumpCrisscross|2 years ago|reply

Someone will get rich turning this into a browser plug-in.

[+] Geee|2 years ago|reply

My pet theory is that our whole simulated world is actually a huge captcha. Captchas keep evolving until you have to live an entire lifetime as a human to prove that you're a human. When you die you wake up and get access to a website.

329 comments