top | item 43026083

(no title)

pava0 | 1 year ago

For example?

discuss

tyho|1 year ago

A crude way: To watermark: First establish a keyed DRBG. For every nth token prediction: read a bit from the DRBG for every possible token to label them red/black. before selecting the next token, set the logit for black tokens to -Inf, this ensures a red token will be selected.

To detect: Establish the same DRBG. Tokenize, for each nth token, determine the red set of tokens in that position. If you only see red tokens in lots of positions, then you can be confident the content is watermarked with your key.

This would probably take a bit of fiddling to work well, but would be pretty much undetectable. Conceptually it's forcing the LLM to use a "flagged" synonym at key positions. A more sophisticated version of a shiboleth.

In practice you might chose to instead watermark all tokens, less heavy handedly (nudge logits, rather than override), and use highly robust error correcting codes.

jl6|1 year ago

It feels like this would only be feasible across longer passages of text, and some types of text may be less amenable to synonyms than others. For example, a tightly written mathematical proof versus a rambling essay. Biased token selection may be detectable in the latter (using a statistical test), and may cause the text to be irreparably broken in the former.

deadbabe|1 year ago

What if the entire LLM output isn’t used? For example, you ask the LLM to produce some long random preamble and conclusion with your actual desired output in between the two. Does it mess up the watermarking?