top | item 35702209

(no title)

weekay | 2 years ago

Why not rely on an robots.txt entry instead of having to explicitly include in http header to not have ai ?

discuss

order

lexlash|2 years ago

Reading the related GitHub issues the dev seems to just not understand HTTP or web crawling etiquette before you get into the “actually AI is good for creators” pitches. The damage is probably done - even if this gets fixed, unethical people building datasets will just use the old versions.

edent|2 years ago

Because - according to the developer - respecting robots.txt is unethical.

His contention is that denying content to AI tools deprives people of their right to better AI tools...

kordlessagain|2 years ago

It’s a straw man argument, which gives you a good idea inside the psyche of the dev.

If anything picks up a URL and uses it later, that is definitely a web crawler.

onepointsixC|2 years ago

Seems pretty clear that it's meant to be malicious compliance with consent, with consent being automatically assumed unless you say no to this specific scrapper, as though there were even a reasonable chance millions of sites could possibly know about the exact tag.

beaviskhan|2 years ago

Probably because he knows doing so would make his life harder and give him less data to scrape.

sharemywin|2 years ago

I'd also be curious what headers he sends like USER-AGENT

sharemywin|2 years ago

that's what I was thinking too.