top | item 43669126

(no title)

prologic | 10 months ago

I've read about Anubis, cool project! Unfortunately, as pointed out in the comments, requires your site's visitors to have Javascript™ enabled. This is totally fine for sites that require Javascript™ anyway to enhance the user experience, but not so great for static sites and such that require no JS at all.

I built my own solution that effectively blocks these "Bad Bots" at the network level. I effectively block the entirety of several large "Big Tech / Big LLM" networks entirely at the ASN (BGP) by utilizing MaxMind's database and a custom WAF and Reverse Proxy I put together.

discuss

xyzzy_plugh|10 months ago

A significant portion of the bot traffic TFA is designed to handle originates from consumer/residential space. Sure, there are ASN games being played alongside reputation fraud, but it's very hard to combat. A cursory investigation of our logs showed these bots (which make ~1 request from a given residential IP) are likely in ranges that our real human users occupy as well.

Simply put you risk blocking legitimate traffic. This solution does as well but for most humans the actual risk is much lower.

As much as I'd love to not need JavaScript and to support users who run with it disabled, I've never once had a customer or end user complain about needing JavaScript enabled.

It is an incredible vocal minority who disapprove of requiring JavaScript, the majority of whom, upon encountering a site for which JavaScript is required, simply enable it. I'd speculate that, even then, only a handful ever release a defeated sigh.

prologic|10 months ago

This is true. I had some bad actors from the ComCast Network at one point. And unfortunately also valid human users of some of my "things". So I opted not to block the ComCast ASN at that point.

Cyphase|10 months ago

For anyone wondering, Oracle holds the trademark for "JavaScript": https://javascript.tm/

prologic|10 months ago

Which arguably they should let go of

jadbox|10 months ago

How do you know it's an LLM and not a VPN? How do you use this MaxMind's database to isolate LLMs?

prologic|10 months ago

I don't distinguish actually. There are two things I do normally:

- Block Bad Bots. There's a simple text file called `bad_bots.txt` - Block Bad ASNs. There's a simple text file called `bad_asns.txt`

There's also another for blocking IP(s) and IP-ranges called `bad_ips.txt` but it's often more effective to block an much larger range of IPs (At the ASN level).

To give you an concrete idea, here's some examples:

$ cat etc/caddy/waf/bad_asns.txt # CHINANET-BACKBONE No.31,Jin-rong Street, CN # Why: DDoS 4134

# CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN # Why: DDoS 4837

# CHINAMOBILE-CN China Mobile Communications Group Co., Ltd., CN # Why: DDoS 9808

# FACEBOOK, US # Why: Bad Bots 32934

# Alibaba, CN # Why: Bad Bots 45102

# Why: Bad Bots 28573

runxiyu|10 months ago

Do you have a link to your own solution?

JsonCameron|10 months ago

I have a pretty similar one. (Works off of the same concept) https://github.com/JasonLovesDoggo/caddy-defender if you're curious. Keep in mind this will not protect you against residential IP scraping.

prologic|10 months ago

Not yet unfortunately. But if you're interested, please reach out! I currently run it in a 3-region GeoDNS setup with my self-hosted infra.