top | item 39183007

(no title)

logiduck | 2 years ago

Copyright laws do not support your argument.

There have been many cases in music where the offending song was forced to pay because it was "close enough" to the curve but not touching it.

discuss

order

andrewla|2 years ago

I think it's an apt analogy, though I disagree about the implication.

If I use ChatGPT to create a work, and that work is "close enough" to an existing copyrighted work, then it seems like I am guilty of copyright violation, not ChatGPT.

corethree|2 years ago

It's not an analogy. This is actually what is done with ML. It is literally a best fit curve problem.

Or maybe it is actually an analogy, but then if this was the case the entire field of ML is capable of only understanding the intricacies of ML through the analogy of curve fitting and what's actually going on underneath the analogy remains elusive.

JoshuaRogers|2 years ago

Or both: when downloading music, both the one downloading and the one uploading can see legal action.

corethree|2 years ago

True, but most points on the curve aren't close to Any data point. That means most of the output of ML is completely original. Let's use a simplified example of a straight line between two datapoints. Example:

   Point A ------|----------------------------------------|------Point B
For a line segment (above) between two points, most of the line does not approach the vicinity of Either point (the boundary of closeness for points A and B is demarcated figuratively with a pipe "|" if the line segment is between the pipe and the point it is too close, if it is not then it is an original work). This intuition still applies even if the line only moves close to the point and does not touch either point. Basically the output of ML is by majority not even close to a copy as most of the curve is far from any point.

The only way for most of the line to be a copy is if the data cluster is so close and similar that the data itself is mostly a chain of "similar" copies. Not sure if you're catching my meaning here. Example:

   Point A --|--|-- Point B
Above A and B cross their own thresholds and are essentially "close enough" copied data points. The left Pipe is the threshold for B and the right pipe is the threshold for A. As a result the entire line between the two points must be a an illegal "close enough" copy as well.

If such data is used it means existing data is in violation of copywrite law already. The logical implication is this:

For most of the results of ML to be a technical "close enough" copy of the datapoints, you must also admit that most of your data contained "close enough" copies as well.

As a side note this kind of thing can be useful for defining a quantitative measure of what close enough even means as we can certainly define a numeric threshold between close enough and not close enough for copywrite law.

yokem55|2 years ago

Because music has a lot of additional law written giving additional protections to song-writers independent of performers and recordings. That gives the abstract tonal sequence it's own copyright.