top | item 47149044

Ask HN: Is anyone tracking AI traffic to their site? Should we care?

2 points| ATechGuy | 5 days ago

Lately we've been noticing a non-trivial amount of traffic in our logs that doesn't look like typical bots.

Not the usual noisy crawlers or obvious scrapers. The behavior is different with fewer hits, more selective page access.

Some of the user agents suggest AI crawlers, but some do not. How can we track these visitors?

2 comments

order

Jonhvmp|1 day ago

Yeah, those selective hits scream custom scrapers or AI data hunters. To track 'em:

- Parse logs: zcat access.log.* | awk '{print $1,$7}' | sort | uniq -c | sort -nr | head -20

Shows top IPs/paths. Whois suspicious ones.

- Add JS fingerprinting (canvas hashing, WebGL) to log real vs headless.

- Bait pages with unique content.

Set up alerts on anomalies. Caught some sneaky ones that way!

mmarian|5 days ago

I'd start with the "why care?" question first.