top | item 47136238

(no title)

pu_pe | 5 days ago

Looks neat and original, congrats!

I don't quite grasp how to interpret the training data attribution process. For example, it seems to say that for a given sentence like "They argued that humans tend to weigh losses more heavily than gains, leading to risk aversion", 24% is attributed to Wikipedia and 23% to Arxiv.

Does that mean that the concepts used in this sentence are also found in those datasets, and that's what's getting compared here? Or does it mean that you can track down which parts of the training data were interpolated to create that sentence?

discuss

adebayoj|5 days ago

Great questions. We weren't quite explicit about the training data attribution process. We'll discuss this in more detail in future work. We can track down which parts of the training data were interpolated to create that sentence. For those training data sentences, we then compare the concepts between generated and training.

We can attribute to exact sentences and chunks in the training data. For the first release, we are sharing only concept similarities. Over the coming weeks, we'll share and discuss how you can actually map to the exact training sentence and chunk with the model.

For a technical overview of how some of these models work, check this link out: https://www.guidelabs.ai/post/prism/

xmcqdpt2|5 days ago

That would be great because "I got it from Wikipedia and Arxiv" isn't exactly useful.

From reading your second link (and please tell me if I got it wrong) it sounds like it isn't actually tracking to training data but to prototypes which are then linked a posteriori to likely sections of the training data. The attribution isn't exact, right? It's more like "these are the likely texts that contributed to one of those prototypes that produced the final answer." Specifically the bit in PRISM titled "Nearest neighbour Search" sounds like you could have a prototype that takes from 1000 sources but 3 of them more than the others, so the model identify those 3, but the other ones might matter just as much in aggregate?

It says that the decomposition is linear. Can you remove a given prototype and infer again without it? That would be really cool.