top | item 43658253

(no title)

Riverheart | 10 months ago

“A legal reality where you can only train AI on content you've licensed would be the worst for everybody bar massive companies, legacy artists included.”

Care to elaborate?

Also, saying artists only concern themselves with the legality of art used in AI because of distaste when there are legal cases where their art has been appropriated seems like a bold position to take.

It’s a practice founded on scooping everything up without care for origin or attribution and it’s not like it’s a transparent process. There are people that literally go out of their way to let artists know they’re training on their art and taunt them about it online. Is it unusual they would assume bad faith from those purporting to train their AI legally when participation up till now has either been involuntary or opt out? Rolling out AI features when your customers are artists is tone deaf at best and trolling at worst.

discuss

Workaccount2|10 months ago

There is no "scooping up", the models aren't massive archives of copied art. People either don't understand how these models work or they purposely misrepresent it (or purposely refuse to understand it).

Showing the model an picture doesn't create a copy of that picture in it's "brain". It moves a bunch of vectors around that captures an "essence" of what the image is. The next image shown from a totally different artist with a totally different style may well move around many of those same vectors again. But suffice to say, there is no copy of the picture anywhere inside of it.

This also why these models hallucinate so much, they are not drawing from a bank of copies, they are working off of a fuzzy memory.

TeMPOraL|10 months ago

> People either don't understand how these models work or they purposely misrepresent it (or purposely refuse to understand it).

Not only that, they also assume or pretend that this is obviously violating copyright, when in fact this is a) not clear, and b) pending determination by courts and legislators around the world.

FWIW, I agree with your perspective on training, but I also accept that artists have legitimate moral grounds to complain and try to fight it - so I don't really like to argue about this with them; my pet peeve is on the LLM side of things, where the loudest arguments come from people who are envious and feel entitled, even though they have no personal stake in this.

ToucanLoucan|10 months ago

Training data at scale unavoidably taints models with vast amounts of references to the same widespread ideas that appear repeatedly in said data, so because the model has "seen" probably millions of photos of Indiana Jones, if you ask for an image of an archeologist who wears a hat and uses a whip, it's weighted averages are going to lead it to create something extremely similar to Indiana Jones because it has seen Indiana Jones so much. Disintegrating IP into trillions of pieces and then responding to an instruction to create it with something so close to the IP as to barely be distinguishable is still infringement.

The flip-side to that is the truly "original" images where no overt references are present all look kinda similar. If you run vague enough prompts to get something new that won't land you in hot water, you end up with a sort of stock-photo adjacent looking image where the lighting doesn't make sense and is completely unmotivated, the framing is strange, and everything has this over-smoothed, over-tuned "magazine copy editor doesn't understand the concept of restraint" look.

Riverheart|10 months ago

The collection of the training data is the “scooping up” I mentioned. I assume you acknowledge the training data doesn’t spontaneously burst out of the aether?

As for the model, it’s still creating deterministic, derivative works based off its inputs and the only thing that makes it random is the seed so it being a database of vectors is irrelevant.