top | item 32341352

(no title)

I looked into doing something like this once and decided it wasn't going to be very effective, for a few different reasons.

JS engines (or even WASM) aren't going to be as fast at this kind of work as native machine code would be. Especially when you consider that libraries like OpenSSL have heavily tuned implementations of the SHA algorithms. Any bot solving a SHA-based challenge would be able to extract the challenge from the page and execute it using native machine code faster than any legitimate user's browser could. And if you increase the difficulty of the challenge, it's just going to punish real users running the challenge in their browser more than it would the bots.

It's also based on the assumption that proof-of-work is going to increase the cost of doing business for the bots in some way and discourage their behavior. Many of the bots I was dealing with in my case were either using cloud compute services fraudulently or were running on compromised machines of unknowing people. And they tended not to care about how long it took or how high-effort the challenge was, they were very dedicated at getting past it and continuing their malicious behavior.

There's also the risk that any challenge that's sufficiently difficult may also make the user's browser angry that a script is either going unresponsive or eating tons of CPU, which isn't much different from cryptocurrency miner behavior.

discuss

jsnell|3 years ago

Yes, the range of applications where proof of work is viable is really narrow. (In fact, so narrow that I suspect it can't work for anything where the abuse has a monetary motive.)

One way to think about this is by comparing the cost of passing the POW to the money the same compute resources would make when mining a cryptocurrency. I believe that a low-end phone used for mining a CPU-based cryptocurrency would be making O(1 cent) per day. Let's say that you're willing to cause 1 minute of friction for legit users on low-end devices (already something that I'd expect will be unacceptable from a product perspective). Congratulations: you just cost the attacker 1/1500th of a cent. That's orders of magnitudes too low to have any impact on the economics of spam, credential stuffing, scraping, or other typical bulk abuse.

jchw|3 years ago

Yep, I have also done this and come to a similar conclusion. The best performance I got was with the WebCrypto API, where I got, IIRC, 100,000 SHA512 hashes a second, with a very optimized routine for iterating the hashcash string. SHA512 probably makes the most sense since AFAIK it's the most intensive hash supported by WebCrypto, and the WebCrypto digest API is async, so a whole lot of time is going to be spent awaiting no matter what you do.

I think WebCrypto is available in workers, so you could probably use numerous workers to get a better hashrate. Still, I don't suspect that would jump it past a million, which seems pretty bad for a desktop computer, and it would be a lot worse on a mobile phone.

It might still be a meaningful impediment when combined with other measures, but with the low bar you'd have to set for preimage bits for mobile devices, it's a little questionable.

realaravinth|3 years ago

Thank you for your detailed response, you raise some very interesting and valid points!

> JS engines (or even WASM) aren't going to be as fast at this kind of work as native machine code would be

You are right. mCaptcha has a WASM and a JS polyfill implementations. Native code will definitely be faster than WASM but in an experiment I ran for fun[0], I discovered that the WASM was roughly 2s slower than native implementation.

> It's also based on the assumption that proof-of-work is going to increase the cost of doing business

mCaptcha is basically a rate-limiter. If an expensive endpoint(say registration: hashing + other validation is expensive) can handle 4k requests/seconds and has mCaptcha installed, then the webmaster can force the attacker to slow down to 1 request/second, significantly reducing the load on their server. That isn't to say that the webmaster will be able to protect themselves against sufficiently motivated attacker who has botnets. :)

> There's also the risk that any challenge that's sufficiently difficult may also make the user's browser angry that a script is either going unresponsive or eating tons of CPU, which isn't much different from cryptocurrency miner behavior.

Also correct. The trick is in finding optimum difficulty which will work for the majority of the devices. A survey to benchmark PoW performance of devices in the wild is WIP[1], which will help webmasters configure their CAPTCHA better.

[0]: https://mcaptcha.org/blog/pow-performance Benchmarking platforms weren't optimised for running benchmarks, kindly take it with a grain of salt. It was a bored Sunday afternoon experiment.

[1]: https://github.com/mcaptcha/survey

Full disclosure: I'm the author of mCaptcha

tusharsoni|3 years ago

> mCaptcha is basically a rate-limiter

This is a much better explanation of what it does than captcha where I expect "proof-of-human". A PoW based rate-limiter is a really interesting idea! Usually, the challenge with unauthenticated endpoints (ex. signups) is that the server has to do more work (db queries) than the client (make an http request) so it is really easy for the client to bring the server down. With PoW, we're essentially flipping that model where the client has to do more work than the server. Good work!

AbacusAvenger|3 years ago

About the benchmark data:

It looks like your pow_sha256 library is using is the "sha2" crate, which is a pure Rust implementation of SHA2. So your benchmark is around the delta of your library compiled to native code vs. your library compiled to WASM, which is an interesting benchmark but I don't think it answers the right question.

A more interesting benchmark would probably answer the question "what would those looking to defeat mCaptcha use and how does that performance compare?" So perhaps an implementation of an mCaptcha challenge solver using OpenSSL would be warranted for that.

mort96|3 years ago

> mCaptcha is basically a rate-limiter.

Hmm, is it a better rate limiter than others? I know that nginx, for example, makes it pretty easy to rate limit based on IP address with the `limit_req` and `limit_req_zone` directives.

In essence, ngix's rate limiter also works by making each request consume a resource, but it makes the resource consumed an IP address (or range) rather than compute resources. It seems intuitive that a malicious actor would have an easier time scaling compute than IP addresses, while a legitimate user will _always_ have an IP address but might be on a machine with 1/100000th the compute resources of the malicious actor.

operator-name|3 years ago

Can you elaborate on why you chose SHA256 as the hash function?

Attackers aren't exactly limited to web apis, and SHA265 is known to be trivially parallelizable on a GPU. RandomX is one such example[0], which reminds me of a similar initiative called RPC-Pay.

[0]: https://github.com/tevador/RandomX [1]: https://www.monerooutreach.org/stories/RPC-Pay.html

aeternum|3 years ago

I'd suggest you consider a new name. Captcha stands for Completely Automated Public Turing Test and this implementation has little to do with that.

vintermann|3 years ago

Yes, the spam doesn't actually have to be profitable for the seller of the spammed product. Not as long as he thinks it is, and is willing to pay the person spamming on his behalf.

And as you say, it's often stolen resources. There may even be another link in the "think it pays" chain: the spammer may buy hacked instances out of a mistaken belief that he can make money on it, by selling spamming services to merchants who also only think it pays. There's a certain "crime premium" where some people seem willing to pay extra (in money or effort) for the feeling that they're fooling someone.