OpenAI Says It's "Over" If It Can't Steal All Your Copyrighted Work

hagbard_c|11 months ago

Yes, well, in a way they're right and I suspect everyone here knows it no matter how and mighty they might want act when commenting. When foreign (here 'Chinese') competition just ignores copyright laws while 'western' companies have to abide by them for every piece of data they use to train their models the former will have a clear advantage over the latter. This also happens to be how the USA acted in the 1800s [1]:

the United States declined an invitation to a pivotal conference in Berne in 1883, and did not sign the 1886 agreement of the Berne Convention which accorded national treatment to copyright holders. Moreover, until 1891 American statutes explicitly denied copyrights to citizens of other countries and the United States was notorious in the international sphere as a significant contributor to the "piracy" of foreign literary products. It has been claimed that American companies for the most part "indiscriminately reprinted books by foreign authors without even the pretence of acknowledgement" (Feather, 1994, 154). The tendency to freely reprint foreign works was encouraged by the existence of tariffs on imported books that ranged as high as 25 percent (see Dozer, 1949).

[1] http://socialsciences.scielo.org/scielo.php?script=sci_artte...

raxxorraxor|11 months ago

Plus in this case I don't even think it a copyright violation to analyze protected works. And even creating derivates of some form isn't, since that is the way any form of art works in the first place.

Sure, perhaps they would need a license to get the material, but I don't see how broken copyright laws will be of any help here.

cbsmith|11 months ago

The US has found ways to get China to adhere to IP laws, but they're never going to agree to restrictions the US doesn't impose on itself. Presuming that there is no way China will respect IP laws is BS.

JohnFen|11 months ago

What I don't understand is why this is always presented as a "race" that "we" have to win or else. It's just such a strange framing to me and every time I see it, it's presented as some sort of self-evident truth, but I don't think it's self-evident at all.

r00fus|11 months ago

The "race" analogy is entirely driven by venture capital framing: they are interested in controlling the market usually done by getting to a certain dominant position within the overall space thereby crowding out new entrants and being able to direct where the market goes.

China's lead efforts on the other hand look to the long view - by releasing their products as open-source they can improve on each other's work. No one controls the market but there is constant competition and innovation.

All this is besides the point, however, for this article claims that OpenAI is using China as an excuse to have unfettered access to all copyrighted works through the fair use loophole.

So the crux is whether we believe in "innovation uber alles" or intellectual property rights.

thewanderer1983|11 months ago

Nature is a complex system. Many are in competition, it's not just humans. Most of these systems form a balance (see biodiversity). Due to resource scarcity, power tends to form which gives these power structures and advantage. Humans form these power structures arounds groups. This has been happening for as long as humans tribalised. Right now, humans can form these groups at nation-state level complexity and to some extent more global. This is humans current best effort. If you can do better, please do.

unknown|11 months ago

[deleted]

scoofy|11 months ago

I mean, I do think we should want to win the race, the point is they want to keep all the money. You can literally just offer equity as compensation to "content providers" and we won't have any problems with liquidity issues on the development side, and people can still be compensated or opt out.

OpenAI doesn't want to do that.

1vuio0pswjnm7|11 months ago

It seems like the term "race" comes from "arms race".

Perhaps the future of Silicon Valley is to be the home of defense contractors.

https://watson.brown.edu/costsofwar/papers/2024/SiliconValle...

EdwardKrayer|11 months ago

It seems that most people on this site believe that this is a good thing, but all this restriction would mean is that for the next while - the only companies able to afford mass licensing would be in the SPY 500, and that's assuming these companies wouldn't just flock to a nation outside of Americas influence.

At some point, it becomes a national security issue. This technology is going to be leveraged in ways we can't even dream up today. Copyright law needs to be re-imagined in a way that won't restrict advancement in AI, and AI-adjacent technology. It's not because we want to - it's because we have to.

slowtrek|11 months ago

It's not that hard. So if you want to ask questions or work with a Stephen King book, you have to rent it during your LLM session. OpenAi would make a small fee, the author would get the majority, and the user gets value. You don't have to be a billion-dollar company to set up a monetization structure like that. Startups could do this if they negotiate with authors.

For general questions, you can use the free wiki that's ingested into the LLM or pay a fee for general content like current events.

You keep the LLM free in the third-world out of necessity. OpenAI, in the first world, cannot ask to be treated as if it were a third-world company because we are too rich to be that ridiculous.

preinheimer|11 months ago

Why? Why do we have to? Why do companies get to take the creative output of humanity for free to make a profit?

Why is it a national security issue? Because people who could make billions of dollars say so?

basisword|11 months ago

You're assuming this leads...somewhere. Currently, AI is not all that useful. And progress seems to be slowing, not accelerating.

techpression|11 months ago

I call XY on this. The problem is inherit in LLMs and the solution is something else altogether, not just allowing companies to ignore the law and lobby for changing said law after the fact.

neilv|11 months ago

It sounds like government continuing to honor the property rights of everyone is getting in the way of a handful of rich people's desire to take all that value for themselves.

ls612|11 months ago

By this logic Google Search couldn't exist. Except that Google won those cases.

slowtrek|11 months ago

So basically, we know China is never going to pay the publishers/content creators (never). If we hold our principles to OpenAI (pay who you took from), they will go bankrupt. So of course they are speaking in end-game language. To suggest the race is lost even before it starts is an incredible thing.

How is it that we can theorize that the model would get better with more data, but we can't theorize that the business model would need to get bigger (pay the content creators) to train the model? Shoot first and ask questions later (or rather, BEG later).

whatthedangit|11 months ago

You know, there's a creative third way which the US could approach if it had the cajones.

Allow OpenAI and other AI companies to use all data for training, but require that they pay it forward by charging royalties on profits beyond X amount of profit, where X is a number high enough to imply true AGI was reached.

The royalties could go into a fund that would be paid out like social security payments for every American starting when they were 18 years old. Companies could likewise request a one time deferred payment or something like that.

It's having your cake and eating it. Also helping ease some tensions around job loss.

Sadly, what we'll likely get is a bunch of tech leaders stumbling into wild riches, hoarding it, and then having it taken from them by force after they become complacent and drunk on power without the necessary understanding of human nature or history to see why they've brought it on themselves.

re-thc|11 months ago

> So basically, we know China is never going to pay the publishers/content creators (never)

Who is we? How do you know? Never is a strong word.

> If we hold our principles to OpenAI (pay who you took from), they will go bankrupt.

i.e. their business wasn't feasible to begin with? Sounds fine? What's wrong with them being bankrupt (if needed).

unknown|11 months ago

[deleted]

grimblee|11 months ago

So, does that mean that openai's models will be opensource then ? I mean, if it's built on our collective intellectual property, its only fair we have free access to it.

giancarlostoro|11 months ago

I think we just need to rethink copyright for language models. I'm okay just licensing 1 copy of a work to any LLM model throughout its various generations. Just don't pirate it if no special license is available, buying the ebook should suffice. It should be no different from a human buying a copy. The rule should only be that it does not leak the entire work.

JohnFen|11 months ago

I'm not OK with that, though... and here we have the nut of the problem. There is no agreement as to what's acceptable and what's not.

I personally think that the odds of me me being able to both publicly publish my words and code and be able to keep them out of training data is pretty close to zero. Since that's unacceptable to me, my only option is not to publish that stuff at all.

CuriouslyC|11 months ago

It's always interesting to see how the title of a HN post radically changes the people who comment and vote. The AI friendly people are being carpet bombed by haters, but in a model release thread the haters would be flagged to oblivion.

archagon|11 months ago

“Haters” is nothing more than a thought-terminating cliche.

ChrisArchitect|11 months ago

Discussion: https://news.ycombinator.com/item?id=43352531

josefritzishere|11 months ago

The product requires crime? I feel like most products do not require crime. This is not a good sales pitch.

r00fus|11 months ago

The product doesn't require crime, but the massive profitability of their business model requires it.

And the red herring that "China will steal it if we don't do it first".

jug|11 months ago

Either that, or copyright law is bad in its current form and LLM’s are yet an example of what exposes that.

Even if copyright owners can’t point to how much damage, if any, they suffer from AI, it’s seen as wrong and bad. I think it’s getting boring to hear that story about copyright repeat itself. In most crimes, you need to be able point to a damage that was done to you.

Also, while there are edge cases in some LLM’s where you can make them spew some verbatim training material, often through jailbreaks or whatnot, an LLM is a destructive process involving ”fuzzy logic” where the content is generally not perfectly memorized, and seems no more of a threat to copyright than recording broadcasts onto cassette tapes or VHS were back in the day. You’d be insane to use that stuff as a source of truth on par with the original article etc.

slowtrek|11 months ago

Can someone please vouch for this thread and unflag it? It's kind of the main tech issue of our time ...

basisword|11 months ago

Some good news for a change!

scudsworth|11 months ago

something tells me that this pathetic messaging approach is not going to be the one that squares the circle between "piracy is illegal" and "information wants to be free"

bpodgursky|11 months ago

Sorry but it is actually a huge problem for the US if the DeepSeek models are able to train on sorta-illegal dumps of scientific papers and US models aren't. The ones that are paywalled by scientific journals.

Everyone WILL start using hosted frontier Chinese models if they are demonstrably better at answering scientific questions than ChatGPT, sending essentially all US research questions into a Chinese data dump. This is even worse than the national security catastrophe that is TikTok (even aside from the EVEN BIGGER issue that China will have models that are staggeringly better than those in the US, because they are up to date on the science).

I understand the reflexivity against AI companies "stealing content" but we need to stay competitive and figure out the financial compensation later. This is not a case where our unbelievably generous copyright laws should take precedence over US competitiveness.

OhNoNotAgain_99|11 months ago

[deleted]

unknown|11 months ago

[deleted]

slowtrek|11 months ago

Why is this flagged?

cadamsdotcom|11 months ago

You have to remember a company is not a social being with balanced obligations. Its obligation is to its owners and not to society.

If OpenAI’s leadership weren’t saying precisely this, they wouldn’t be doing their jobs.

basisword|11 months ago

>> Its obligation is to its owners and not to society.

This isn't true at all. It has an obligation to follow the law of the society it operates in, even if that results in lower profits.

slowtrek|11 months ago

Yeah, we have a wrong conception. It's fine, society often has wrong conceptions. We are just dead wrong about ruthless capitalism. A company is a custodian of a good society, it has responsibilities that far exceed profit.

nadermx|11 months ago

Copyright infrigement is not stealing[0]. The person still has what they made. Not sure why they propigate it as theft. Seems like a pro copyright propaganda extremisit article which goes significantly against progress of advancements for arts and sciences.

[0]https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...

klabb3|11 months ago

> against progress of advancements for arts and sciences.

No it’s always against commercialization. That’s why we have exceptions like political commentary, satire and in particular arts and sciences. The issue is about making money from someone else’s work.

You can still disagree with it of course, but let’s have an honest discussion.

79 comments