top | item 45711478

(no title)

asgerhb | 4 months ago

Not to mention they have to store the data after they download it. In theory storing garbage data is costly to them. However I have a nagging feeling that the attitude of these scrapers is they get paid the same amount per gigabyte whether it's nonsense or not.

discuss

order

luckylion|4 months ago

If they even are AI crawlers. Could be just as well some exploit-scanners that are searching for endpoints they'd try to exploit. That wouldn't require storing the content, only the links.

m3047|4 months ago

If you look at the pages which are hit and how many pages are hit by any one address in a given period of time it's pretty easy to identify features which are reliable proxies for e.g. exploit scanners, trawlers, agents. I publish a feed of what's being hit on my servers, contact me for details (you need to be able to make DNS queries to a particular server directed at a domain which is not reachable from ICANN's root).