top | item 37477122

(no title)

xodjmk | 2 years ago

I think it depends on what you believe. Only the title and first paragraph mention 'pirated'. The actual complaints are simply about using copyrighted material. In some people's minds, like the article mentioned Steven King and Sarah Silverman, they believe that just using their content to train a Machine Learning algorithm is somehow stealing their work. I couldn't give a shit less about their opinions, but it is a belief that many people hold. The types of Machine Learning algorithms that are involved, if they are done properly will never store a single bit or pixel of the original data, this is not how they work. My view is that training is equivalent to perceiving one's surroundings and storing impressions and abstracted memories of the perceptions. So when someone paints a painting, it is not stealing to view the painting and store an impression of it in your mind. If someone scrapes the entire internet for all published data and uses that to train an A.I., I don't consider that stealing or pirating. If they hacked into someone's personal computer, then that would be pirating.

discuss

fatfingerd|2 years ago

The 2 claims I heard from Sarah Silverman are that a specific AI was documented to be trained from a specific library of pirated content and that it gave verbatim or near verbatim output from her book. Piracy and plagiarism for non corporate humans.

xodjmk|2 years ago

If this is true, that does not sound like a successful implementation of an 'AI'. What would be the point of just parroting some existing content? There is not even an economic reason to doing this. The point is to train a model, some sort of neural net or similar structure, that does not store specific information, but can synthesize new unique content according to some prompt. If some crackhead or group of crackheads is trying to profit by creating an algorithm that regurgitates Sarah Silverman content, then yes, hire a bunch of lawyers and sue them. That sounds like a very stupid game. No legit engineer is actually trying to do that. I'm not cheerleading for some big corporate tech company. I am 100% in favor of open source, run my own little Linux+GPU cloudless experiments. I see the problem as reversed. It's corporations, copyright lawyers and other interests trying to gatekeep data and control who has access to what. I don't think it's healthy to have hyper vigilant copyright laws blocking access to data, so in the end, only huge corporations will be able to pay all the fees to make progress with A.I.