top | item 34671189

(no title)

Yes, this doesn't use attribution techniques like influence functions or Shapley values that are popular in machine learning research, but I am pretty convinced that even a nearest neighbors search is better than the current baseline offered by "AI art systems": shrug our shoulders and say nothing about the role of human-created training data in producing the outputs.

As far as I know, nobody is even thinking about doing the very expensive experiments needed to get ground truth data for formal attribution techniques in the generative AI context (for a given prompt, retrain your model so you can see how the output changes when a particular training example or group of examples is omitted or added), so we're nowhere near building true attribution systems for these very large models. Centering the training data will be net good for public discourse on the topic.

That said, I see why people want to push back on some of the language used here.

discuss

No comments yet.