top | item 26593530

(no title)

zmarty | 5 years ago

A lot of news websites restrict any crawler other than Google. And this does not happen only via robots.txt.

discuss

order

simias|5 years ago

Indeed, years ago I had scripts to automatically fetch URLs from IRC and I quickly realized that if I didn't spoof the user agent of a proper web browser many websites would reject the query. Googlebot's UA worked just fine however.