top | item 36194733

(no title)

EricLeer | 2 years ago

I wonder if it would be possible to probe what a model is trained on by usage of prompts the reply to which can only be answered well with certain training data.

For instance if I have some body of text that can't be found elsewhere on the internet, if the reply of the model references the information in that text in some way you may be fairly certain it was used in training.

The hard part is probably finding such a body of text.

discuss

srvmshr|2 years ago

That premise was published in a NeuRIPS paper not long ago:

Radioactive data: tracing through training

    Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value).