top | item 4112201

(no title)

xibernetik | 13 years ago

Not really... Even if the code is difficult to patch, speech/audio recognition doesn't advance much when an attacker figures out how to remove the (non-random) noise added by a machine over the sound file. Actual speech recognition relies on the ability to filter out background noise - which is a lot more complex/random - added by surroundings, not a machine.

It's very difficult to generate some sort of noise via algorithm that a) humans can filter out and b) can't be removed by some algorithm. As a result, audio captchas are a huge vulnerability and the weakest link in almost any captcha system, although you can't get rid of them by law.

Hypotheticals aside, the code was easy to patch - note the footnote: > In the hours before our presentation/release, Google pushed a new version of reCAPTCHA which fully nerfs our attack.

discuss

d2vid|13 years ago

Could one take real recorded noise and add that rather than noise generated via algorithm? Wouldn't that force attackers to solve a real problem (removing background noise from an speech sample)?

xibernetik|13 years ago

It's not really solving the "real" problem... If I'm just mashing two audio files together, that's going to be different than someone talking in the middle of a train platform and there will likely be algorithmicly-determinable difference from the artificially generated words and the naturally generated noise.

All of this aside, removing background noise is not a huge issue anymore. We have pretty decent noise-cancellation technology. Speech recognition - the other big component - has advanced a lot in recent times and is actually pretty good, although not for every company/product.

Even if it would be helpful, you'd have to record an incredible amount of noise in the first place, seeing as you're getting millions of hits a day and if you have a small sample set, the attackers will just figure out the solutions to that sample set and be done.

I'm not saying it's impossible, but I am saying it's probably not worth it at this point. Captchas (in their traditional forms) don't make sense as a long-term strategy anyways.

robryan|13 years ago

Yeah, you would think they could record thousands of hours of real world noise then randomly use sections of it on each audio captcha.