I do essentially both: robots.txt backed by actual server-level enforcement of the rules in robots.txt. You'd think there would be zero hits on the server-level blocking since crawlers are supposed to read and respect robots.txt, but unsurprisingly they don't always. I don't know why this isn't a standard feature in web hosting.
For my personal stuff I also included a Nepenthes tarpit. Works great and slows the bots down while feeding them garbage. Not my fault when they consume stuff robots.txt says they shouldn't.
I'm just not sure if legal would love me doing that on our corporate servers...
claudiulodro|11 months ago
Joe_Cool|11 months ago
I'm just not sure if legal would love me doing that on our corporate servers...
rustc|11 months ago