top | item 47123201

(no title)

13pixels | 7 days ago

Facebook is honestly the least interesting crawler misbehaving right now. The real shift is GPTBot, ClaudeBot, PerplexityBot and a dozen other AI crawlers that don't even identify themselves half the time.

I've been monitoring server logs across ~150 sites and the pattern is striking: AI crawler traffic increased roughly 8x in the last 12 months, but most site owners have no idea because it doesn't show up in analytics. The bots read everything, respect robots.txt maybe 60% of the time, and the content they index directly shapes what ChatGPT or Perplexity recommends to users.

The irony is that robots.txt was designed for a world where crawling meant indexing for search results. Now crawling means training data and real-time retrieval for AI answers. Completely different power dynamic and most robots.txt files haven't adapted.

discuss

order

XCSme|5 days ago

This matches what I've been noticing. A lot of AI crawler traffic just doesn't show up clearly in typical analytics dashboards, especially when tools aggressively filter or sample.

Part of why I built UXWizz was to avoid black-box filtering and keep control over how traffic is classified. When you own the analytics stack, you get to decide what’s "valid" instead of inheriting someone else's definition.