Take estimated losses of the NYT from this "innovation" and multiply by 10^x where is "x" high enough to make tech companies stop and think before they break laws next time. That would be my approach at least.
The training isn't the issue per se, it's the regurgitation of verbatim text (or close enough to be immediately identifiable) within a for-profit product. Worse still that the regurgitation is done without attribution.
The legal argument, which I'm sure you are very well aware of, is that training a model on data, reorganizing, and then presenting that data as your own is copyright infringement.
necroforest|2 years ago
noitpmeder|2 years ago
LargeTomato|2 years ago
anamexis|2 years ago
unknown|2 years ago
[deleted]
zztop44|2 years ago