top | item 47068097

(no title)

blt | 11 days ago

What makes this different from linking to a random zip file somewhere?

discuss

order

zythyx|11 days ago

Microsoft could have used any dataset for their blog, they could have even chosen to use actual public domain novels. Instead, they opted to use copywritten works that JK hasn't released into the public domain (unless user "Shubham Maindola" is JK's alter ego).

bossyTeacher|11 days ago

Rowling is known for using pseudonyms. Maybe she got tired of writing and decided to break into LLM tech.

fxwin|11 days ago

The licensing: If I steal something and tell you its free and yours for the taking, that feels different than a Fence (knowingly) buying stolen goods. It's obviously semantics and there should have been some better judgemend from MS, but downloading a dataset (stated as public domain) from kaggle feels spiritually different from piracy (e.g.: if someone uploads a less known, copyrighted data set to kaggle/huggingface under an incorrect license, are tutorials that use this data set a 'guide to pirating' this data set? To me, that feels like a wrong use of the term)

Lerc|11 days ago

The licence?

If it comes from a site claiming it was under a licence when it was not, the misdeed is done by the person who provided the version carrying the licence.

wongarsu|11 days ago

Just because it says "CC0" does not make it CC0. If you upload a dataset you don't have the rights to, any license declaration you make is null and void, and anyone using it as if it had that license is violating copyright

Even if MS could claim that they were acting in good faith there really isn't much legal wiggle room for that. But it doesn't even come to that because I don't think anyone would buy that they really thought that the Harry Potter books were under the CC0

slopinthebag|11 days ago

Oh come on. The licence was obviously incorrect and you cant escape culpability because of that.

philipwhiuk|11 days ago

The 'artwork' they generated and the text on the blog post?