Can someone who understands this work explain why using a range (de)coder to uncompress your ciphertext into the GPT2 generated distribution (in other words the most obvious construction for the case when the sender and receiver have the model) is insufficient to achieve perfect security by their definition?
If I’m not mistaken, you still need the key itself to undo the steganography. They provide a way to autoregressively rewrite the decoded message with the aid of the key. I think the perfect secrecy part means that embedding the plaintext doesn’t cause any divergence from the model’s output distribution, so it’s undetectable. You just adapt the covertext in such a way that still has about the same information content but eg via different words.
I think it's the LLM (GPT2) link that makes this interesting to most since it allows relatively high data rate hidden/deniable communication through text interfaces.
It's been known for a long time that the bandwidth steganography is limited to the amount of randomness in the open data, eg Anderson and Peticolas "On the limits of steganography", '98 (I hope that's the right one anyway - a while since I read it). Which implies that if someone came up with a good reason to publish a bunch of high entropy randomness, the world would have much greater bandwidth of potential steganographic channels.
Now AI has done this for us. I suppose that under authoritarian regimes you will soon have to cryptographically prove that you generated your random bits deterministically from specific keys.
I can see how steganography applied to images can result in hard-to-detect watermarks or provenance identifiers.
But I don't see how these can be effectively used in text content. Yes, an AI program can encode provenance identifiers by length of words, starting letters of sentences, use of specific suffixes, and other linguistic constructs.
However, say that I am a student with an AI-generated essay and want to make sure my essay passes the professor's plagiarism checker. Isn't it pretty easy to re-order clauses, substitute synonyms, and add new content? In fact, I think there is even a Chrome extension that does something like that.
Or maybe that is too much work for the lazy student who wants to completely rely on ChatGTP or doesn't know any better.
Is this using steganography to create un-tamperable watermarks (which would allow downstream users of possibly-AI-created material to prove that it is indeed created by an AI) or, is this for something different?
nullc|2 years ago
pizza|2 years ago
kurthr|2 years ago
ajb|2 years ago
Now AI has done this for us. I suppose that under authoritarian regimes you will soon have to cryptographically prove that you generated your random bits deterministically from specific keys.
rahmeero|2 years ago
But I don't see how these can be effectively used in text content. Yes, an AI program can encode provenance identifiers by length of words, starting letters of sentences, use of specific suffixes, and other linguistic constructs.
However, say that I am a student with an AI-generated essay and want to make sure my essay passes the professor's plagiarism checker. Isn't it pretty easy to re-order clauses, substitute synonyms, and add new content? In fact, I think there is even a Chrome extension that does something like that.
Or maybe that is too much work for the lazy student who wants to completely rely on ChatGTP or doesn't know any better.
wanderingbit|2 years ago
wrycoder|2 years ago