top | item 21053094

(no title)

acolytic | 6 years ago

I'm not very familiar with the concept of tarpitting. How do they get the bot to run CPU intensive code? By passing in extra Javascript? Can this affect a bot that doesn't run any JS?

discuss

order

AznHisoka|6 years ago

I've been running many non-JS crawlers for the past few years, and there were a few pages that kept pushing the CPU load of my servers to a halt. When I dug into the source code, I saw that the HTML was a convoluted text of tables inside tables inside tables inside more tables, thus making it incredibly time-consuming + CPU-intensive for my DOM parser to parse (I was using Nokogiri, a Ruby gem at the time). Thus Cloudflare could be serving these types of "fake" pages to bad bots.

They could also be doing things like serving fake streaming audio that never ends, or anything that might make it seem like the web page is just a huge page that needs time to load.

glenngillen|6 years ago

Usually via javascript. Many of the credential stuffing and similar bots need to run headless browsers these days to be able to do their job. The folks at Kasada (https://www.kasada.io) have talked over the years at a high level of some of the approaches they've taken, there should be a few conference presentations on YouTube. They don't get into the finer detail though as I assume there's a large amount of secret sauce about what they do too.

I'm not sure what they do for the non-JS use case. They sit in the request path like a CDN though so maybe they just return an error or deliberately slow response times?

nodja|6 years ago

I wonder if it's possible to send a compressed response that uses tons of cpu. Like a zip bomb that takes up cpu time instead of memory/disk space.

s09dfhks|6 years ago

Was also curious about this step. I assume they're not going to reveal the nitty gritty details for fear of botters coding around it, but I am curious as to how you can "make them use more CPU" while crawling a website

jsnell|6 years ago

Serve the suspected bots a page with Javascript that computes a proof of work, submits it, and gets the real page in return.

Steamspy.com seems to trigger one of these basically every time when loaded with a fresh cookie.

nielsbot|6 years ago

Maybe it's JavaScript?