top | item 47068059

(no title)

fxwin | 11 days ago

I feel like the title is a bit misleading, unless the person who put all HP books on Kaggle as a (supposedly) CC0-licensed data set did so as a Microsoft employee.

Nevertheless pretty egregious oversight (incompetence?) and something that shouldn't have been published.

discuss

order

blt|11 days ago

What makes this different from linking to a random zip file somewhere?

zythyx|11 days ago

Microsoft could have used any dataset for their blog, they could have even chosen to use actual public domain novels. Instead, they opted to use copywritten works that JK hasn't released into the public domain (unless user "Shubham Maindola" is JK's alter ego).

fxwin|11 days ago

The licensing: If I steal something and tell you its free and yours for the taking, that feels different than a Fence (knowingly) buying stolen goods. It's obviously semantics and there should have been some better judgemend from MS, but downloading a dataset (stated as public domain) from kaggle feels spiritually different from piracy (e.g.: if someone uploads a less known, copyrighted data set to kaggle/huggingface under an incorrect license, are tutorials that use this data set a 'guide to pirating' this data set? To me, that feels like a wrong use of the term)

Lerc|11 days ago

The licence?

If it comes from a site claiming it was under a licence when it was not, the misdeed is done by the person who provided the version carrying the licence.

philipwhiuk|11 days ago

The 'artwork' they generated and the text on the blog post?

uyzstvqs|11 days ago

To clarify: Microsoft linked to a dataset on Kaggle, which is falsely labeled CC0 (Public Domain). It's the fault of the user who uploaded the dataset and misrepresented the licensing.

Ekaros|11 days ago

Multiple failures. One on writer of blog even for a moment considering that such data set would be legal. And next for MS for hiring such a person with that poor judgement. Namely publicly posting about it on company platform. Instead of choosing some other data set.

robrain|11 days ago

The original title was "LangChain Integration for Vector Support for SQL-based AI applications"

ASalazarMX|11 days ago

For some reason I really like this.