perhaps adding a challenge (captcha, other) if the pattern of the client is being considered malicious (requests per min, avg time on page, other stats)? This might not entirely block the crawler, but will still provide data that will allow blocking entirely the source if it's a cloud IP range, degrading the velocity of crawling if measures taken.
The other way around would be displaying publicly a degraded in any way copy of the content and delivering a full (high quality) one after signing in.Overall - simply making life a bit harder for the crawler (AI or plain old hardcoded one).
No comments yet.