Is it assumed that humans perform 100% against this captcha? Because being one of those humans it’s been closer to 50% for me
I’m guessing Google is evaluating more than whether the answer was correct enough (ie does my browser and behavior look like a bot?), so that may be a factor
Wow. Cross-tile performance was 0-2%. That's the challenge where you select all of the tiles containing an item where the single item is in a subset of tiles. As opposed to all the tiles that contain the item type (static - 60% max) and the reload version (21% max). Seems to really highlight how far these things are from reasoning or human level intelligence. Although to be fair, the cross-tile is the one I perform worst on too (but more like 90+% rather than 2%).
Same! As we talk about in the article, the failures were less from raw model intelligence/ability than from challenges with timing and dynamic interfaces
Hcaptcha cofounder here. Enterprise users have a lot of fancy configuration behind the scenes. I wonder if they coordinated with recaptcha or just assume there sitekey in the same as others
There is a doom loop mode where it doesn't matter how many you solve or even if you get them correct. My source for this works on this product at Google.
The buses and fire hydrants are easy. It is the bicycles. If it goes a pixel over the next box, do I select the next box? Is the pole part of the traffic light? And the guy as you say. There is a special place in hell for the inventor of reCaptcha (and for all of Cloudflare staff as fas as I am concerned!)
Didn't look a lot into this but I think the fact that humans are willing to do this in the "cents per thousand" or something range means that it's really hard to get much interest in automating it
Not sure it is your case but I think I sometimes had to solve many of them when I am in my daily task rush. My hypothesis is that I solve them too fast for "average human resolving duration" recaptcha seems to expect (I think solving it too fast triggers bot fingerprint). More recently when I fall on a recaptcha to solve, I consciently do not rush it and feel have no more to solve more than one anymore. I don't think I have super powers, but as tech guy I do a lot a computing things mechanically.
Just select the audio option. It's faster and easier. Maybe it's because google doesn't care about training on speech to text. I usually write something random for one word and get the other word correct. I can even write "bzzzzt" at the beginning. They don't care because they aren't focused on training on that data.
Now I think of it, it's really a failure that AI didn't use this and went with guessing which square of an image to select.
I always assume that people are lazy and try and click the least amount of squares as possible to get broadly the correct answer. Therefore, if it says motorbikes just click on the body of the bike and leave out rider and tiles with hardly any bike in them.
If it says traffic lights just click on the ones you can see lit and not the posts and ignore them if they are too far in the distance. Seems to work for me.
The other fun thing is the complete lack of localisation for people not from the US. "Select the squares with crosswalks" - with what? Oh, right, the pedestrian crossings... And the fire hydrants look like we've seen in movies, it's like, oh yeah those do exist in real life!
> do you select the guy riding it? do you select the post?
Just select as _you_ would. As _you_ do.
Imperfection and differing judgments are inherent to being human. The CAPTCHA also measures your mouse movement on the X and Y axes and the timing of your clicks.
While running this I looked at hundreds and hundreds of captchas. And I still get rejected on like 20% of them when I do them. I truly don't understand their algorithm lol
To this day I hate captchas. Back when it was genuinely helping to improve OCR for old books, I loved that in the same way I loved folding@home, but now I just see these widgets as a fundamentally exclusionary and ableist blocker. People with cognitive, sight, motor, (and many other) impairments are at a severe disadvantage (and no, audio isn't a remedy, it is just shifting to other ableisms). You can add as many aria labels as you like but if you're relying on captchas, you are not accessible. It really upsets me that these are now increasing in popularity. They are not the solution. I don't know what is, but this aint it.
So, when do we reach a level where AI is better than humans and we remove captcha from pages alltogether? If you don't want bots to read content, don't put it online, you're just inconveniencing real people now.
They can also sign up and post spam/scams. There are a lot of those spam bots on YouTube, and there probably would be a lot more without any bot protection. Another issue is aggressive scrapers effectively DOSing a website. Some defense against bots is necessary.
Forget whether humans can't distinguish your AI from another human. The real Turing test is whether your AI passes all the various flavors of captcha checks.
The solvers are a problem but they give themselves away when they incorrectly fake devices or run out of context. I run a bot detection SaaS and we've had some success blocking them. Their advertised solve times are also wildly inaccurate. They take ages to return a successful token, if at all. The number of companies providing bot mitigation is also growing rapidly, making it difficult for the solvers to stay on top of reverse engineering etc.
I'd have a job with the first cross-tile one shown saying select squares with motorcycles. Does the square above the handle bars appearing to maybe contain part of a rear view mirror count? I'm not surprised the LLMs were failing on those.
At this point i am convinced all captchas almost entirely rely on ip reputation. Even on linux with hardened firefox you can get stuck in a infinite loop with one IP but then switch to another one that let's you in after 0-2 tries.
Is calling Browser Use and "open source framework" a bit misleading it looks like a commercial product that requires an API key to use even if you run the source?
interesting results. why does reload/cross-tile have worse results? would be nice to see some examples of failed results (how close did it to solving?)
We have an example of a failed cross-tile result in the article - the models seem like they're much better at detecting whether something is in an image vs. identifying the boundaries of those items. This probably has to do with how they're trained - if you train on descriptions/image pairs, I'm not sure how well that does at learning boundaries.
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
I'm also curious what the success rates are for humans. Personally I find those two the most bothersome as well. Cross-tile because it's not always clear which parts of the object count and reload because it's so damn slow.
Indeed, captcha vs captcha bot solvers has been an ongoing war for a long time. Considering all the cybercrime and ubiquitous online fraud today, it's pretty impressive that captchas have held the line as long as they have.
Ok and then? Those models were not trained for this purpose.
It's like the last hype over using generative AI for trading.
You might use it for sentiment analysis, summarization and data pre-processing. But classic forecast models will outperform them if you feed them the right metrics.
It is relevant because they are trained for the purpose of browser use and completing tasks on websites. Being able to bypass captchas is important for using many websites.
It would be nice to see comparisons to some special-purpose CAPTCHA solvers though.
I’ve used LLMs to solve captchas for shits and giggles, just taking a screenshot and pasting it into ChatGPT and having it tell me what squares to click and I think it solves them better than I do.
Can we just get rid of them now, they are so annoying and basically useless.
jameslk|3 months ago
I’m guessing Google is evaluating more than whether the answer was correct enough (ie does my browser and behavior look like a bot?), so that may be a factor
daveguy|3 months ago
RobertDeNiro|3 months ago
flakiness|3 months ago
criddell|3 months ago
mdahardy|3 months ago
swyx|3 months ago
alexnewman|3 months ago
amirhirsch|3 months ago
Xenoamorphous|3 months ago
Also, when they ask you to identify traffic lights, do you select the post? And when it’s motor/bycicles, do you select the guy riding it?
Sayrus|3 months ago
Either that or it was never about the buses and fire hydrants.
terminalshort|3 months ago
cm2187|3 months ago
datadrivenangel|3 months ago
hnburnsy|3 months ago
sixhobbits|3 months ago
utopman|3 months ago
nistiminic|3 months ago
Now I think of it, it's really a failure that AI didn't use this and went with guessing which square of an image to select.
felixfurtak|3 months ago
If it says traffic lights just click on the ones you can see lit and not the posts and ignore them if they are too far in the distance. Seems to work for me.
stephen_g|3 months ago
perfmode|3 months ago
Just select as _you_ would. As _you_ do.
Imperfection and differing judgments are inherent to being human. The CAPTCHA also measures your mouse movement on the X and Y axes and the timing of your clicks.
mdahardy|3 months ago
Semaphor|3 months ago
jameslk|3 months ago
This type of captcha is too infuriating so I always skip it until I get the ones where I’m just selecting an entire image, not parts of an image
Google’s captchas are too ambiguous and might as well be answered philosophically with an essay-length textbox
nwellinghoff|3 months ago
xnx|3 months ago
Will be interesting to see how Gemini 3 does later this year.
mdahardy|3 months ago
bena|3 months ago
unknown|3 months ago
[deleted]
padolsey|3 months ago
TulliusCicero|3 months ago
I also perform poorly on cross-tile, I never know whether to count a tiny bit of a bicycle in a square as "a bike in that square".
ajsnigrutin|3 months ago
cubefox|3 months ago
rkagerer|3 months ago
timshell|3 months ago
kjok|3 months ago
arbol|3 months ago
tim333|3 months ago
akimbostrawman|3 months ago
VectorLock|3 months ago
maknee|3 months ago
mdahardy|3 months ago
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
Youden|3 months ago
cedws|3 months ago
mdahardy|3 months ago
PaulHoule|3 months ago
mdahardy|3 months ago
golfer|3 months ago
mehdibl|3 months ago
It's like the last hype over using generative AI for trading.
You might use it for sentiment analysis, summarization and data pre-processing. But classic forecast models will outperform them if you feed them the right metrics.
daveguy|3 months ago
https://ai.google.dev/gemini-api/docs/image-understanding
Legend2440|3 months ago
It would be nice to see comparisons to some special-purpose CAPTCHA solvers though.
throwawayu5pg|3 months ago
guluarte|3 months ago
theoldgreybeard|3 months ago
Can we just get rid of them now, they are so annoying and basically useless.
WhereIsTheTruth|3 months ago
mdahardy|3 months ago
sjapps|3 months ago
[deleted]
jngiam1|3 months ago