top | item 36902014

(no title)

madsmith | 2 years ago

I believe LLMs should be allowed to read/view/consume content and learn from it even if that content has a copyright.

We phrase it like somehow the material is being copied into the LLM, but that’s not what it’s doing. It’s building a neural graph from the experience of consuming that content.

What would the world be like if humans couldn’t learn, train the weights of the interconnects of their neural tissue, from any material with a copyright?

discuss

api|2 years ago

It’s a form of lossy compression. Can I strip the copyright off an image by JPEG compressing it?

At the very least I think LLMs trained on data that the trainer does not own or have rights to use in that manner should not be copyrightable.

madsmith|2 years ago

All knowledge is lossy compression.

My thinking “the enemy gate is down” when considering the tokens “Ender’s Game” is my recalling a learned association of those tokens to the given token string.

My knowing that doesn’t strip the copyright. My telling someone the meaning and context of the phrase generally doesn’t strip the copyright away from Orson Scott Card. I’m not reproducing his work but my knowledge of it. And it’s dependent on what I do with that knowledge and how if I’ve violated his copyright.

We are prosecuting the LLMs for possessing fragments of knowledge. And we’re assuming that the recall of some of those fragments means a copy of that work is in fact contained within the weights.

bick_nyers|2 years ago

An LLM is a lossy compression of the internet and I think it should be treated as such. You can't copyright the internet itself.

hmcq6|2 years ago

A mathematical transformation of the data is not enough to qualify as a transformative work. Saving a copy-written work in a lossy compression does not negate the copywrite.