top | item 20615718

(no title)

elehack | 6 years ago

Google did not create reCAPTCHA. They bought it; it was started by Luis von Ahn, who went on to create Duolingo.

When reCAPTCHA was created, the alternative was CAPTCHA, which tried to impede bots but did not generate any social benefit. This was the genius of the original reCAPTCHA concept: the time taken to 'confirm humanity' could be channeled into the socially-useful endeavor of digitizing books. Capture some of the heat emissions of impeding bots for a useful purpose, rather than letting it all go to waste.

Now, yes, Google is using it to train their self-driving car AI, and there's a bunch else happening in it to connect to Google's surveillance apparatus. There's much to legitimately criticize there. I personally don't view training Google's proprietary AI as the same kind of intrinsically altruistic purpose as digitizing the world's pre-digital books.

But putting the entire concept on blast with erroneous history that can be corrected with about 60 seconds on Wikipedia doesn't help the argument at all.

discuss

order

avip|6 years ago

I’m hereby inventing a new rule called Chesterton’s Wild Boar fence, the essence of which is people who don’t have gardens, or don’t hang out at night, would always complain about wild boar fences, as they lack any awareness of the beast and its damage, or they downright believe its mere existence is a myth.

GauntletWizard|6 years ago

I second this notion. I've been using sharks as an example: Sharks aren't dangerous when swimming, because we don't swim in deep water, because sharks are dangerous. Shark attacks in deep water beaches are fatal at roughly the same rate as riding a bike without a helmet. [1]

The risk of sharks is tempered by our experience with them. Few people swim in deep water beaches (because they have signs saying "Danger! Sharks!") And those that do typically take appropriate precautions and maintain awareness.

Sharks don't want to eat you and do quickly let go of swimmers they attack, but that's irrelevant because the damage has already been done. When I was young, a large amount of education was put into stating that shark attacks were rare, and it's true both in absolute numbers and by comparison with how feared they are. Jaws and knockoffs spread irrational fear in the 70s/80s, and my early 90s childhood came with the counterpressue there, but that counterpressue caused many in my peer group to misunderstand the risks. Shark attacks drop off hard to zero if you're swimming in shallow water. Even at 10 meters, which is not uncommon for surfers, they are a real risk. But surfers spend little time at 10 meters out. All of this forms a balance.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3941575/

nrb|6 years ago

I think this can be summed up with a single word: incomprehension. Or in the latter example: ignorance

xenocyon|6 years ago

Saying it "impedes bots" is a little generous; it impedes humans as well. Or rather: it works on a spectrum where bots are at one end (fully obstructed), easily tracked humans at the other (free entry), and humans who disable tracking devices and/or eschew Google services somewhere in the middle (allowed to pass after much hassle).

polskibus|6 years ago

It definitely impedes Firefox users as opposed to Chrome users.

szhu|6 years ago

To be fair, having a tracked history does make it easier to prove that you are human.

sli|6 years ago

reCAPTCHA generally only works for me after I do it 3-4 times. Purely because I use a VPN. reCAPTCHA v3, curiously, works just fine when I'm using a VPN (if I allow it to run in the first place).

djsumdog|6 years ago

Yep, it was originally run by Carnegie Mellon (as you mentioned, by its creator Luis von Ahn and others).

This article also doesn't seem to touch on the newer reCaptcha that tracks you everywhere on a website (you'll notice a little blue box on the bottom right with the logo where this happens), not just on login or user input pages.

There is a lot to criticize about reCaptcha, including privacy concerns for sure, and there were some other posts about it on HN before.

zelphirkalt|6 years ago

So basically it has been perverted and acts in ways that harms people now, like most things Google touches.

iagooar|6 years ago

I wonder if in some jurisdiction Google shouldn't pay money for forcing people to train their AI. I imagine it could be possible to do that in Germany, or under some EU laws.

iamnotacrook|6 years ago

Perhaps in Germany Google can charge for their services, but waive the fee if they solve captchas.

gambiting|6 years ago

Am I the only person here who always entered absolute nonsense for the scanned word? The original reCaptcha had two words, one which was clearly generated and another which was clearly scanned - to "solve" the captcha all you needed to do was to enter the generated word correctly, the other could be literally anything. So I always entered banana or something similar for literally everything.

mherdeg|6 years ago

You're not the only person -- I have a friend who did this, also generally inserting silly words in the side he guessed was scanned from a book.

I have a hunch that von Ahn knew this would happen and the same scan is shown to multiple users before a word is chosen.

dessant|6 years ago

reCAPTCHA v2 blocks [1] people with disabilities from accessing basic services on the web, such as registering to vote, paying utilities, filing taxes, or accessing medical services. This practice is likely illegal, and the sites which facilitate it may be legally liable.

reCAPTCHA v3 has no user interface, it only returns a score upon which the site operator can act, often delaying or blocking access [2] to services. In this case the responsibility falls entirely on the site, while Google is no longer at risk of being found liable for the damage caused by its discriminatory service.

reCAPTCHA v3 works best when it is embedded on every page of a site [3]. The service collects detailed interaction data on every website you visit which has implemented it. The extent of tracking is similar to Google Analytics, but you cannot block it, otherwise you lose access to large portions of the web.

The collected data is highly sensitive, it not only contains your browsing history, but a detailed snapshot of your actions on sites. Mouse movements can reveal health issues which affect your motor functions, and your interests and desires are laid bare based on how you interact with content.

Google must be compelled to disclose in the reCAPTCHA privacy policy what data is collected and how is that data used. Journalists have asked Google for years to clarify how the data collected by the reCAPTCHA service is being used, and their answer is always the same: we only use your data to provide the reCAPTCHA service, and it is not used to personalize ads.

The problem is, those are just words from their PR department, the legally binding documents are the privacy policy and the terms of service. reCAPTCHA uses the same privacy policy like the rest of the Google services, which gives them the right to use your data for ad personalization.

You must resist against adding reCAPTCHA v2 and v3 to your sites. There are alternatives [4] which could offer the same level of protection for your services, when used the right way. Their implementation may not be as convenient as reCAPTCHA is, but that is the price you must pay to prevent Google from mining our personal data and our every interaction on the web.

People are forced to hand over their personal data to Google at all times, otherwise they face losing access to services, and being excluded from societal processes that are increasingly happening exclusively online.

This is where privacy rights and human rights are violated, and it is upon all of us to make our voices heard, so that exisiting legislation is enforced, and new laws are put in place to prevent companies from abusing and exploiting us.

Handing over our data to Google must not be a condition to fully participate in society.

[1] https://github.com/w3c/apa/issues/25

[2] https://news.ycombinator.com/item?id=20295333

[3] https://developers.google.com/recaptcha/docs/v3

[4] https://www.w3.org/TR/turingtest/

My1|6 years ago

Couldn't one use u2f as a captcha alternative, obviously without information about the stick itself, only the batch attestation, and then throwing the registration in the bucket? After all it does need an interaction in the meatspace and sure a bot could be engineered to trigger it, but you can't just relay the challenge somewhere and have someone else clear it for you and even if you have a lego construction or whatever to clear your captcha, it's FAR slower than having many people on a solving service help you.

clairity|6 years ago

exactly. google should be banned from all online public services of any kind, since they can't be avoided. it's unreasonable to expect people to shop around for a town to live in that doesn't, and never will, use privacy-invading google services like recaptcha.

i'd even support a ban for other core services like utilities and banking that may not be public entities.

xtracto|6 years ago

This brings me a thought: What if I someone created a service that channels Amazon Mechanical Turk tasks as CAPTCHAs so that you(r website) could make a buck of those people solving captchas?

loco5niner|6 years ago

They did the same thing to collect walking data using the game Ingress, and Goog-411 to collect voice data.

Well specifically, Niantic, which was a google internal thing.

jamesgeck0|6 years ago

I could swear that close to the launch of Ingress some Niantic employee said something along the lines of, "We're not actually collecting much data. It's all secretly an evil plan to get nerds to exercise," but I can't find a source.

Paul-ish|6 years ago

Could an organization do the same thing, but as a non-profit with open datasets? This way everyone benefits.

inlined|6 years ago

> but putting the entire concept on blast with erroneous history that can be corrected with about 60 seconds on Wikipedia doesn’t help the article at all.

Nor does an entirely fallacious premise. ReCAPTCAH v3 is entirely transparent and non invasive to users. In fact it’s retroactive to help the site admin figure out what to do with the score:

https://developers.google.com/recaptcha/docs/v3

worble|6 years ago

>ReCAPTCAH v3 is entirely transparent and non invasive to users

Except when you don't opt into google tracking you by blocking third party scripts, in which case your life still gets to be hell.

lucideer|6 years ago

> non invasive to users

... who are opted into fully and completely to all Google tracking and have previously participated in Google's ecosystem.

Pretty odd definition of "entirely fallacious".

mirimir|6 years ago

That's what they say, but it rarely works out that way for me. And yes, it's probably because I always block all tracking, and use VPNs and/or Tor.

mustacheemperor|6 years ago

> non invasive to users.

Except for the invasion of my time and attention, used to train Google's AI to get better at recognizing traffic signals. I took that as the main point of the article.