(no title)
tyho | 1 year ago
To detect: Establish the same DRBG. Tokenize, for each nth token, determine the red set of tokens in that position. If you only see red tokens in lots of positions, then you can be confident the content is watermarked with your key.
This would probably take a bit of fiddling to work well, but would be pretty much undetectable. Conceptually it's forcing the LLM to use a "flagged" synonym at key positions. A more sophisticated version of a shiboleth.
In practice you might chose to instead watermark all tokens, less heavy handedly (nudge logits, rather than override), and use highly robust error correcting codes.
jl6|1 year ago
drdeca|1 year ago
Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).
But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?
deadbabe|1 year ago