(no title)
camkego | 11 days ago
https://www.kaggle.com/datasets/shubhammaindola/harry-potter...
More than just using the data, it seems linking to a copy that claims the dataset is public domain, would be problematic copyright-wise.
Also interesting, this blog post has been up since November of 2024, very surprising to me that Microsoft hasn't taken it down yet.
throwaway2037|10 days ago
When I try to fill the questionaire, my request is rejected with this message:
Hysterical. What a farce. That data set is pure theft.throawayonthe|10 days ago
(e.g. see youtube, where this is (used to be?) poorly enforced, it's a mess)
Sohcahtoa82|10 days ago
nonfamous|10 days ago
ChoGGi|9 days ago
fxwin|11 days ago
Would it? Sounds to me like the blame lies on the person uploading the dataset under that license, unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'
DSMan195276|10 days ago
Yes there's an expectation that you put in some minimum amount of effort. The license issue here is not subtle, the Kaggle page says they just downloaded the eBooks and converted them to txt. The author is clearly familiar enough with HP to know that it's not old enough to be public domain, and the Kaggle page makes it pretty clear that they didn't get some kind of special permission.
If you want to get more specific on the legal side then copyright infringement does not require that you _knew_ you were infringing on the copyright, it's still infringement either way and you can be made to pay damages. It's entirely on you to verify the license.
Retr0id|11 days ago
Why wouldn't that apply?
rob_c|10 days ago
pavon|10 days ago