top | item 41695462

(no title)

arder | 1 year ago

I think the most acheivable way of having some verification of AI images is simply for the AI generators to store finger prints of every image they generate. That way if you ever want to know you can go back to Meta or whoever and say "Hey, here's this image, do you think it came from you". There's already technology for that sort of thing in the world (content ID from youtube, CSAM detection etc.).

It's obviously not perfect, but could help and doesn't have the enormous side effects of trying to lock down all image generation.

discuss

order

Someone|1 year ago

> That way if you ever want to know you can go back to Meta or whoever and say "Hey, here's this image, do you think it came from you".

Firstly, if you want to know an image isn’t generated, you’d have to go to every ‘whoever’ in the world, including companies that no longer exist.

Secondly, if you ask evil.com that question, you would have to trust them to answer honestly for both all images they generated and images they didn’t generate (claiming real pictures were generated by you can probably be career-ending for a politician)

This is worse than https://www.cs.utexas.edu/~EWD/ewd02xx/EWD249.PDF: “Program testing can be used to show the presence of bugs, but never to show their absence!”. You can neither show an image is real nor that it is fake.

kortex|1 year ago

What's to stop someone from downloading an open source model, running it themselves, and either just not sharing the hashes, subtly corrupting the hash algo so that it gives a false negative, etc?

Also you need perceptual hashing (since one bitflip of the generated media alters the whole hash) which is squishy and not perfectly reliable to begin with.

alkonaut|1 year ago

Nothing. But that’s not the point. The point is that, to a rounding error, all output is made by a small number of models from a small number of easily regulated companies.

It’s never going to be possible to ensure all media is reliably tagged somehow. But if just half of media generated is identifiable as such that helps. Also helps avoid it in training new models, which could turn out useful.