So the easiest strategy to hamper them if you know you're serving a page to an AI bot is simply to take all the hyperlinks off the page...?
That doesn't even sound all that bad if you happen to catch a human. You could even tell them pretty explicitly with a banner that they were browsing the site in no-links mode for AI bots. Put one link to an FAQ page in the banner since that at least is easily cached
When I used to build these scrapers for people, I would usually pretend to be a browser. This normally meant changing the UA and making the headers look like a read browser. Obviously more advanced techniques of bot detection technique would fail.
Failing that I would use Chrome / Phantom JS or similar to browse the page in a real headless browser.
I was collecting UK bank account sort code numbers (to a buy a database at the time costs a huge amount of money). I had spent a bunch of time using asyncio to speed up scraping and wondered why it was going so slow, I had left Fiddler profiling in the background.
conartist6|2 months ago
That doesn't even sound all that bad if you happen to catch a human. You could even tell them pretty explicitly with a banner that they were browsing the site in no-links mode for AI bots. Put one link to an FAQ page in the banner since that at least is easily cached
FieryMechanic|2 months ago
Failing that I would use Chrome / Phantom JS or similar to browse the page in a real headless browser.
tigranbs|2 months ago
FieryMechanic|2 months ago