rli's comments

rli | 13 years ago | on: Ask HN: How are you dealing with scraping hits from EC2 machines?

robots.txt can be ignored, it's just a reference for honest spiders. I think the way described above, of listing top requestors, doing statistics and then automating blocking is indeed the best way. Could also be there's a blocklist or two around of malicious scrapers. And if there isn't, that's a new business proposal.
page 1