Google CAPTCHAs were designed and deployed as a mechanism to train AIs. That's why they are the way they are. Any security theater surrounding them is entirely incidental. So it's no surprise that the AIs are now good at solving them. We've trained them for years.
All true, except: While these are considered just an excruciating security pain for users, they do serve a non-theatrical purpose in many cases of throttling the speed of brute force attacks (or at least costing your opponent money).
This doesn't make sense. reCAPTCHA certainly does what it says on the tin. But the way it does it has almost nothing to do with the challenge the human sees. It's all behavioral analytics, including leveraging Google's collected data to determine how likely a user is a bot before they even load the page.
I'm not denying reCAPTCHA is a source of training data for Google -- surely there's no particular reason that every single reCAPTCHA V2 challenge is about identifying traffic objects, and it's not like Google is building a self-driving AI or anything.
But that's the business model, not the core feature.
And, that training data isn't just given to the developers of captcha solving bots.
Once they get fully trained then how will websites ever distinguish between an intelligent bot and real human? At least now, they are outsourcing that filtering to services like cloudflare. But with this kind of training, how will even cloudflare distinguish between bot and the human?
>So it's no surprise that the AIs are now good at solving them
Funnily enough, AI may be better at solving them than people. I've encountered many Google captchas which reject the correct answers, because you know... bots trained it to accept incorrect ones. Anyway, at least it's not stop signs anymore. It must have been truly embarrassing that Google was simultaneously selling "self driving" cars but at the same time demonstrating that stop sign recognition couldn't be done by robots.
I still find it funny that Google, with the advantage of having millions of Internet users train their AI like galley slaves for free, hasn’t yet been able to crack vision driven self driving. Tesla had no such advantage when training their FSD to recognize traffic lights, bicycles, motorcycles, etc.
As best as I can tell this study explores many facets of how humans solve captchas. I couldn't find anything about AIs outperforming humans in the study. Can someone give me a section reference?
Solving reCAPTCHA v2/v3 requires more than just clicking the box and an image puzzle. If that was all it was we would be overrun by now.
Lots of folks commenting that the title's statement makes sense because CAPTCHAs are meant to train AIs. While this is broadly true, that's a nice side effect. The way modern CAPTCHAs like reCaptcha V2+ work, is they monitor behavioral analytics-- from things like your browsing history to how your mouse moves on the page. This is why most of the time, most people only need to click a box. I'm not sure there's a LMM out there that includes mouse movement as a modality.
The kinds of AIs that are designed to beat CAPTCHAs also don't have the data from Google et al to use to train, unless we're concerned Google is training it's own bots to bypass CAPTCHAs, I suppose it's not inconceivable?
I guess validating a payment card is going to be the next step to sign up for whatever. Don’t allow pre paid BINs and let’s go. Gonna be pretty miserable, however someone needs to find something as I currently would rather pay 0.01$ instead of solving a captcha. Especially the select all the bicycles; it’s a waste of life.
At this point the amount of friction added to all these things is pushing things towards just not doing them in the first place (buying less stuff, using social media less). Nature walks and paper books doesn't have captchas.
The next step is device attestation. IIRC Safari already does this, so you should not see captcha on places that support it.
Something that can work on any browser can be like this:
Scan the QR code in your iPhone or Android device that supports attestation. Will ask you if you approve login, then will attest for you. If you turn out to be a bad actor, the website can ban this device - so no flooding with a single device.
look up indian UPI. "validating payment card" and all that snazzy bits are error prone, old, archaic and cost a fortune to businesses.
in upi system, you are presented with a QR code or you input your UPI ID, you click pay and it gets through.
if you are worried about "fraud protection", why rely on an intermediary like ebay or credit card company and instead should take up with your bank or the seller or courts.
Please. Last time I had to solve a captcha it was wasted 15 minutes (not exaggerating!) of my life, clicking on an endless stream of bikes, motorcycles, buses and stoplights. As punishment for using a vpn.
Does HN ever require CAPTCHAs? It seems to do pretty well with its basic but battle-tested moderation/antispam tools, and rate-limiting that seems to repel all but the most concerted DDoS attacks. I don't think HN has any unreasonable restrictions on scraping or third-party clients, either. And it manages to serve 5M unique visitors a month and 10M views a day[0].
HN is also not really a very attractive target. The only thing you can do is post spam, and that's pretty low-value in terms of actual monetary value to the abuser, and tools to deal with that have been around for decades as you say.
This is very different from many other sites where the potential to make a buck is much more pronounced and direct.
They go down somewhat frequently. I think it’s like four 9’s? I’m not sure why they insist on running just a few machines though. They have more than enough money and probably make up the difference by the advertising for YC that they get.
I cant tell if the audience of HN are more likely to script something untoward against HN, be that DDOS or just "check out my product" spam, because its a bunch of hackers - or less likely to do it because (maybe) we like having nice things, or figure the audience is too in the know to fall for boring crypto spam.
I find captchas extremely painful, because of ambiguity and not loading all the pictures. I wait for a minute and some never show. When they do load, so manyare pics of bicycles and motorcycles and cross walks. Are you supposed to click on the tiny piece that goes tojust past another tile or not? You can't refresh one that doesn't load, I think most of them start over if you refresh.
Like other people reported, if you ever use tor, it's very common for the captchas to just not load. They just kind of hang without showing the pictures. Regular websites generally just work fine on tor, it seems to be a captcha problem.
I thought this was already happening ~7 years ago. The "what text is in this image captchas" got a lot less common a while ago, and I think this was partly the reason why.
Submitted title was "AI bots are now outperforming humans in solving CAPTCHAs", which broke HN's title rule: "Please use the original title, unless it is misleading or linkbait; don't editorialize."
Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
Misleadingly editorialised title. Actual title and abstract (which doesn't say anything about AIs "now" outperforming humans):
An Empirical Study & Evaluation of Modern CAPTCHAs
* For nearly two decades, CAPTCHAs have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAs have continued to improve. Meanwhile, CAPTCHAs have also evolved in terms of sophistication and diversity, becoming increasingly difficult to solve for both bots (machines) and humans. Given this long-standing and still-ongoing arms race, it is critical to investigate how long it takes legitimate users to solve modern CAPTCHAs, and how they are perceived by those users.*
* In this work, we explore CAPTCHAs in the wild by evaluating users' solving performance and perceptions of unmodified currently-deployed CAPTCHAs. We obtain this data through manual inspection of popular websites and user studies in which 1,400 participants collectively solved 14,000 CAPTCHAs. Results show significant differences between the most popular types of CAPTCHAs: surprisingly, solving time and user perception are not always correlated. We performed a comparative study to investigate the effect of experimental context -- specifically the difference between solving CAPTCHAs directly versus solving them as part of a more natural task, such as account creation. Whilst there were several potential confounding factors, our results show that experimental context could have an impact on this task, and must be taken into account in future CAPTCHA studies. Finally, we investigate CAPTCHA-induced user task abandonment by analyzing participants who start and do not complete the task.*
@dang, could you please correct the title? Thanks.
All of these papers miss that captchas have multiple levels of difficultly. People who get an enterprise account or work closely with the captcha providers will find very different results. Many captcha providers now decide what captchas to send out, in hard mode based on what LLMs cannot solve
Captchas are purposely not made too hard as people like pex.com need to be able to bypass them for copyright enforcement. Note I’m biased as I was a founder of hcaptcha
I think I prefer the recent CAPTCHAs (where you solve a puzzle by rotating an item, or finding the matching item). The older ones from years ago (deciphering mangled text and trying to work out if it is an `i`, `1` or `l` were more annoying)
Bot operators can already pay human captcha solvers as the paper mentions. So all this does is potentially replace those humans with AI, driving down prices for bot operators.
As prices for bot operators decrease, website operators will increase the challenge and drive up effort for the intended website audience (humans) who are solving captchas instead of paying bots.
In the end, the website operators will have to stop using captchas as the intended website audience will no longer be willing to solve harder captchas.
Website operators can use alternatives, like asking for micro-payments, high enough to discourage most bot operators.
I wish captcha providers universally had to provide a way to shut down their use by bad actors. Here in Canada I get tons of scam texts pointing me to a fake banking or postal service website asking me to pay a fake bill. I want to ddos them with fake payment data but they’re all protected by hcaptcha.
I have been locked out of websites for solving a captcha so quickly that it thought I was a bot. So we went from requiring humans to solve a puzzle that bots can't to now requiring that humans solve the puzzle slower than bots do.
It's really amazing when we still get those text ones and nowadays you can literally select the text in many of the images and copy/paste into the input field.
Yeah, the claim of the headline comes from the first sentence in Section 5.5. I think either the title should match the paper's title or that should be pointed out as part of the submission - not sure how HN's title guidelines work.
My pet theory is that our whole simulated world is actually a huge captcha. Captchas keep evolving until you have to live an entire lifetime as a human to prove that you're a human. When you die you wake up and get access to a website.
[+] [-] caymanjim|2 years ago|reply
[+] [-] noduerme|2 years ago|reply
[+] [-] rezonant|2 years ago|reply
I'm not denying reCAPTCHA is a source of training data for Google -- surely there's no particular reason that every single reCAPTCHA V2 challenge is about identifying traffic objects, and it's not like Google is building a self-driving AI or anything.
But that's the business model, not the core feature.
And, that training data isn't just given to the developers of captcha solving bots.
[+] [-] wouldbecouldbe|2 years ago|reply
[+] [-] pyeri|2 years ago|reply
[+] [-] panny|2 years ago|reply
Funnily enough, AI may be better at solving them than people. I've encountered many Google captchas which reject the correct answers, because you know... bots trained it to accept incorrect ones. Anyway, at least it's not stop signs anymore. It must have been truly embarrassing that Google was simultaneously selling "self driving" cars but at the same time demonstrating that stop sign recognition couldn't be done by robots.
[+] [-] leobg|2 years ago|reply
[+] [-] rezonant|2 years ago|reply
Solving reCAPTCHA v2/v3 requires more than just clicking the box and an image puzzle. If that was all it was we would be overrun by now.
Lots of folks commenting that the title's statement makes sense because CAPTCHAs are meant to train AIs. While this is broadly true, that's a nice side effect. The way modern CAPTCHAs like reCaptcha V2+ work, is they monitor behavioral analytics-- from things like your browsing history to how your mouse moves on the page. This is why most of the time, most people only need to click a box. I'm not sure there's a LMM out there that includes mouse movement as a modality.
The kinds of AIs that are designed to beat CAPTCHAs also don't have the data from Google et al to use to train, unless we're concerned Google is training it's own bots to bypass CAPTCHAs, I suppose it's not inconceivable?
[+] [-] croemer|2 years ago|reply
[+] [-] mherrmann|2 years ago|reply
[+] [-] anonzzzies|2 years ago|reply
[+] [-] nuz|2 years ago|reply
[+] [-] mrtksn|2 years ago|reply
Something that can work on any browser can be like this: Scan the QR code in your iPhone or Android device that supports attestation. Will ask you if you approve login, then will attest for you. If you turn out to be a bad actor, the website can ban this device - so no flooding with a single device.
[+] [-] 2Gkashmiri|2 years ago|reply
in upi system, you are presented with a QR code or you input your UPI ID, you click pay and it gets through.
if you are worried about "fraud protection", why rely on an intermediary like ebay or credit card company and instead should take up with your bank or the seller or courts.
[+] [-] shwouchk|2 years ago|reply
[+] [-] joseda-hg|2 years ago|reply
[+] [-] calderknight|2 years ago|reply
[+] [-] gary_0|2 years ago|reply
[0] https://news.ycombinator.com/item?id=33454140
[+] [-] arp242|2 years ago|reply
This is very different from many other sites where the potential to make a buck is much more pronounced and direct.
[+] [-] nextaccountic|2 years ago|reply
[+] [-] shiomiru|2 years ago|reply
[+] [-] jamiek88|2 years ago|reply
[+] [-] ShamelessC|2 years ago|reply
[+] [-] bongobingo1|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] NotSammyHagar|2 years ago|reply
Like other people reported, if you ever use tor, it's very common for the captchas to just not load. They just kind of hang without showing the pictures. Regular websites generally just work fine on tor, it seems to be a captcha problem.
[+] [-] xlbuttplug2|2 years ago|reply
I ask myself this every time.
[+] [-] lapcat|2 years ago|reply
[+] [-] armchairhacker|2 years ago|reply
[+] [-] dang|2 years ago|reply
Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
[+] [-] YeGoblynQueenne|2 years ago|reply
An Empirical Study & Evaluation of Modern CAPTCHAs
* For nearly two decades, CAPTCHAs have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAs have continued to improve. Meanwhile, CAPTCHAs have also evolved in terms of sophistication and diversity, becoming increasingly difficult to solve for both bots (machines) and humans. Given this long-standing and still-ongoing arms race, it is critical to investigate how long it takes legitimate users to solve modern CAPTCHAs, and how they are perceived by those users.* * In this work, we explore CAPTCHAs in the wild by evaluating users' solving performance and perceptions of unmodified currently-deployed CAPTCHAs. We obtain this data through manual inspection of popular websites and user studies in which 1,400 participants collectively solved 14,000 CAPTCHAs. Results show significant differences between the most popular types of CAPTCHAs: surprisingly, solving time and user perception are not always correlated. We performed a comparative study to investigate the effect of experimental context -- specifically the difference between solving CAPTCHAs directly versus solving them as part of a more natural task, such as account creation. Whilst there were several potential confounding factors, our results show that experimental context could have an impact on this task, and must be taken into account in future CAPTCHA studies. Finally, we investigate CAPTCHA-induced user task abandonment by analyzing participants who start and do not complete the task.*
@dang, could you please correct the title? Thanks.
[+] [-] alexnewman|2 years ago|reply
Captchas are purposely not made too hard as people like pex.com need to be able to bypass them for copyright enforcement. Note I’m biased as I was a founder of hcaptcha
[+] [-] codedrivendev|2 years ago|reply
[+] [-] croemer|2 years ago|reply
As prices for bot operators decrease, website operators will increase the challenge and drive up effort for the intended website audience (humans) who are solving captchas instead of paying bots.
In the end, the website operators will have to stop using captchas as the intended website audience will no longer be willing to solve harder captchas.
Website operators can use alternatives, like asking for micro-payments, high enough to discourage most bot operators.
[+] [-] tarruda|2 years ago|reply
Similarly to how dApps work in ethereum-like blockchains?
[+] [-] drexlspivey|2 years ago|reply
[+] [-] tomschwiha|2 years ago|reply
[+] [-] bamboozled|2 years ago|reply
[+] [-] barbazoo|2 years ago|reply
[+] [-] jbd0|2 years ago|reply
[+] [-] olliej|2 years ago|reply
[+] [-] mherrmann|2 years ago|reply
[+] [-] renonce|2 years ago|reply
[+] [-] mdale|2 years ago|reply
[+] [-] dasrecht|2 years ago|reply
[+] [-] JumpCrisscross|2 years ago|reply
[+] [-] Geee|2 years ago|reply