top | item 45840574 (no title) __alexander | 3 months ago Care to share the scrapped data? I would love to play around with it. discuss order hn newest costco|3 months ago Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html unknown|3 months ago [deleted] saberience|3 months ago So you're ok with stealing the data yourself but not ok with providing it to others, ironic. guelo|3 months ago I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages. jacquesm|3 months ago They might send him a bill for use of resources. load replies (1) demaga|3 months ago I am not sure about legal side of things here, but a Kaggle dataset would be really cool
costco|3 months ago Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html unknown|3 months ago [deleted] saberience|3 months ago So you're ok with stealing the data yourself but not ok with providing it to others, ironic.
saberience|3 months ago So you're ok with stealing the data yourself but not ok with providing it to others, ironic.
guelo|3 months ago I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages. jacquesm|3 months ago They might send him a bill for use of resources. load replies (1)
demaga|3 months ago I am not sure about legal side of things here, but a Kaggle dataset would be really cool
costco|3 months ago
unknown|3 months ago
[deleted]
saberience|3 months ago
guelo|3 months ago
jacquesm|3 months ago
demaga|3 months ago