Not the exact same problem, but a few months ago, I tried to block youtube traffic from my home (I was writing a parental app for my child) by IP. After a few hours of trying to collect IPs, I gave up, realizing that YouTube was dynamically load-balanced across millions of IPs, some of which also served traffic from other Google services I didn't want to block.
I wouldn't be surprised if it was the same with LLMs. Millions of workers allocated dynamically on AWS, with varying IPs.
In my specific case, as I was dealing with browser-initiated traffic, I wrote a Firefox add-on instead. No such shortcut for web servers, though.
They rely on residential proxies powered by botnets — often built by compromising IoT devices (see: https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro... ). In other words, many AI startups — along with the corporations and VC funds backing them — are indirectly financing criminal botnets.
You cannot block LLM crawlers by IP address, because some of them use residential proxies. Source: 1) a friend admins a slightly popular site and has decent bot detection heuristics, 2) just Google “residential proxy LLM”, they are not exactly hiding. Strip-mining original intellectual property for commercial usage is big business.
How does this work? Why would people let randos use their home internet connections? I googled it but the companies selling these services are not exactly forthcoming on how they obtained their "millions of residential IP addresses".
Are these botnets? Are AI companies mass-funding criminal malware companies?
so user either has a malware proxy running requests without being noticed or voluntarily signed up as a proxy to make extra $ off their home connection. Either way I dont care if their IP is blocked. Only problem is if users behind CGNAT get their IP blocked then legitimate users may later be blocked.
edit: ah yes another person above mentioned VPN's thats a good possibility, also another vector is users on mobile can sell their extra data that they dont use to 3rd parties. probably many more ways to acquire endpoints.
Yoric|3 months ago
I wouldn't be surprised if it was the same with LLMs. Millions of workers allocated dynamically on AWS, with varying IPs.
In my specific case, as I was dealing with browser-initiated traffic, I wrote a Firefox add-on instead. No such shortcut for web servers, though.
m3047|3 months ago
* https://www.dnsrpz.info/
* https://github.com/m3047/rear_view_rpz
bonsai_spool|3 months ago
adobrawy|3 months ago
strogonoff|3 months ago
skrebbel|3 months ago
Are these botnets? Are AI companies mass-funding criminal malware companies?
globalnode|3 months ago
edit: ah yes another person above mentioned VPN's thats a good possibility, also another vector is users on mobile can sell their extra data that they dont use to 3rd parties. probably many more ways to acquire endpoints.
ninja3925|3 months ago