top | item 45844927

(no title)

diffeomorphism | 3 months ago

The robots.txt is pretty explicit that this scraping is "disallowed"

https://www.goodreads.com/robots.txt

So legalities aside, this seems unethical.

discuss

order

sputr|3 months ago

Why would it be unethical?

This obsession with "everything must be commercialized" is really killing creativity.

Now if the author was commercializing other peoples reviews, sure, it's potentially(!) unethical. But scraping a website for reviews that are publicly(!) posted, training a recommendation LLM and then sharing it, for free, seems ... exactly the ideal use case for this technology.

paulnpace|3 months ago

It is truly criminal that such a bright and brilliant model of ethics, Amazon, should endure such an attack.

diffeomorphism|3 months ago

Unethical behavior does not become good just because it happens to hurt "bad people" (or more accurately, companies bought by bad people).

galdauts|3 months ago

I agree. As a frequent reviewer on Goodreads, this feels really icky.

psandor|3 months ago

You are right.

At the same time, everything you ever posted online has already been scraped by hundreds (maybe thousands) of entities and distributed/sold to countless other entities. The only difference is that OP shared his project here.

contravariant|3 months ago

If it's unethical it's not because of what the robots.txt says.

Blindly violating it is bad manners, but deliberately scraping a single website over a month isn't the worst.