top | item 36726570

(no title)

twayt | 2 years ago

> They probably can:

No, actually they probably can’t. There is no verifiable way to remove the data from the model apart from completely removing all instances of information from the training data. The project you linked only describes a selective finetuning approach.

discuss

xnx|2 years ago

It's an area of active research: https://ai.googleblog.com/2023/06/announcing-first-machine-u...

twayt|2 years ago

Until you get models with completely disentangled feature spaces such that you know that the influence of a piece of data is completely removed (at the limit this is something like an embedding DB), there is absolutely no way you can claim you’ve removed the data from the model.

At most, these efforts will amount to data laundering where it will be impossible to prove that a piece of data was used to train the model, not provide conclusive proof that it was removed.

NBJack|2 years ago

Which means we are probably at least 5-10 years away from verifiable action that a court of law will recognize.

brucethemoose2|2 years ago

They can probably prevent LLaMA from spitting out verbatim quotes from the books well enough to make proof difficult.

... But yeah, fundamentally the only way to throw out the books is to throw out the weights.