There's a difference between feeding massive amounts of copyrighted material to a training process that blends them thoroughly and irreversibly, and doing all that in-house, vs. offering people a service that indexes (and possibly partially rehosts) that material, enabling and encouraging users to engage directly in pirating concrete copyrighted works.
Ironically the low tech infringing proposal would lead to more reliable results grounded in the raw contents of the data, using less computing/power and without the confidently incorrect sycophanty we see from the LLMs.
There's this famous phrase in Russian that was born out of a short interview with a woman, a strong Putin supporter, that's often been used as a sarcastic remark for pointing out someone's double standards and/or hypocrisy.
It can be roughly translated to "you don't understand, it's a completely different situation". That's what's constantly on my mind when I'm reading discussions like this one.
Everybody and their dog torrenting petabytes of data and getting away with it (Meta is the only one that got caught and they've still gotten away with doing it)?
The very same data poor American students were forced to commit suicide over? The same data that average American housewives were sued over for millions of dollars of "damages"? The same data that often gets random German plumbers or steelworkers to pay thousands of euros of "fines" to the copyright mafia so they won't get sued and have their lives ruined?
Yet when giant corporations are doing the exact same thing on a massive scale, it's fine? It's not even the same thing, an American student torrenting books isn't making any money off it, while Meta very much is.
Of course it's not the same, a simple-minded and poorly educated person like me isn't capable of understanding the difference. You keep believing in your moral superiority, the rest of the world has finally woken up.
> > or some other country that doesn't respect international copyright though.
> Like the US? OpenAI et al. don't give a shit.
OpenAI is not a country and therefore cannot make laws that don't respect international (or domestic) copyright. Also the US is a lot bigger than OpenAI and the big tech corps, and the law is very much on the side of copyright holders in the US.
> the law is very much on the side of copyright holders in the US.
Remind me again what the status of the case is with Meta/Facebook using pirated material to train their proprietary LLMs, and even seeding the data back to the community while downloading it?
The money is definitely in the side of big tech vs book publishers. There may be a nominal settlement to end the matter, perhaps after a decade of litigation
TeMPOraL|9 months ago
sellmesoap|9 months ago
corgi912|9 months ago
It can be roughly translated to "you don't understand, it's a completely different situation". That's what's constantly on my mind when I'm reading discussions like this one.
Everybody and their dog torrenting petabytes of data and getting away with it (Meta is the only one that got caught and they've still gotten away with doing it)?
The very same data poor American students were forced to commit suicide over? The same data that average American housewives were sued over for millions of dollars of "damages"? The same data that often gets random German plumbers or steelworkers to pay thousands of euros of "fines" to the copyright mafia so they won't get sued and have their lives ruined?
Yet when giant corporations are doing the exact same thing on a massive scale, it's fine? It's not even the same thing, an American student torrenting books isn't making any money off it, while Meta very much is.
Of course it's not the same, a simple-minded and poorly educated person like me isn't capable of understanding the difference. You keep believing in your moral superiority, the rest of the world has finally woken up.
r14c|9 months ago
gosub100|9 months ago
It's okay, you can say 'laundering'
freedomben|9 months ago
> Like the US? OpenAI et al. don't give a shit.
OpenAI is not a country and therefore cannot make laws that don't respect international (or domestic) copyright. Also the US is a lot bigger than OpenAI and the big tech corps, and the law is very much on the side of copyright holders in the US.
diggan|9 months ago
Remind me again what the status of the case is with Meta/Facebook using pirated material to train their proprietary LLMs, and even seeding the data back to the community while downloading it?
gosub100|9 months ago