top | item 42869194

(no title)

kiviuq | 1 year ago

That's not the only issue. They want a guarantee that the model wasn't trained on copyrighted material.

discuss

Now that is a real feature for now. A lot of hesitation in embracing generative AI in large enterprises stems from uncertainty about copyright issue. Anyone who trained an o1-level model from scratch on public/properly licensed data only would be able to provide a very valuable service to those enterprise customers.

However, if both training and operating costs of a DeepSeek-like model are as small as they are, the companies best able to offer this service are... Microsoft, Amazon and Google. And second best are... teams inside the would-be customer enterprises themselves. $6M to train and $6K to run is effectively free for such companies; there is no moat here. The services that enterprise customers would happily buy instead of building are... operations, and assuming legal liability if the model turns out not to be safe from copyright infringement lawsuits. But those are exactly the services those companies are already buying from Microsoft, Amazon and Google.

fulafel|1 year ago

This would result in some refreshing models, I guess they would be trained mostly on out-of-copyright stuff from 75+ years ago and wouldn't have knowledge of the modern world.

Maybe they could skin the robotic bureucrats in vintage scifi appearance as well to have the whole consistent experience when you go to the building permits bot, there could be small talk about the latest Beatles record etc.

TeMPOraL|1 year ago

Enforcing copyright on training data to this extent would actually create a temporary moat for the biggest players - they can afford to hire a lot of cheap labor to supplement the training dataset with human-authored original works that skirt IP protections by interpreting, parodying, commenting on or otherwise describing the protected works without actually infringing on them. As long as they keep those datasets private, everyone else is shit out of luck.

(I'm reiterating my prediction wrt. AI and moats - the only mid-term moat there can be is in human labor. Hardware vendors benefit from selling better hardware to more people for less; software and research are cheap to scale, datasets eventually leak or get reproduced. Human labor is the one thing that doesn't scale, and except for an economic crisis, only ever gets more expensive with time. Whatever edge one can get by applying human labor that cannot be substituted by AI - like RLHF and its evolutions - is the one that will last all the way to AGI; past that, moats won't matter anymore.)

One of the many reasons I'm firmly on the side of making the training of large neural models exempt of copyright considerations for everyone.