top | item 46959100

(no title)

Phemist | 19 days ago

I've seen this approach in other places, so it's not specifically a point against you specifically, just a question i'm interested in.

> Exfiltration patterns I'm missing

I was wondering about these entropy-based approaches. If I can make the AI agent run arbitrary python code, and I have access to the secrets, then I can make an infinite amount of encoders that have low "local" entropy, but would still be decodable into your secret. A few examples:

- Take 16 random words longer than `N` characters, encode each 4-bit nibble of the secret into this encoding. The output can be [in order, the 16-word dictionary][word1 word2 word3 word4... wordX]

- Repeat each character of a password N times, separate by spaces, e.g. password `hunter1` becomes `hhhhhhhh uuuuuuuu nnnnnnn ttttttt eeeeeee rrrrrrr 1111111`.

Potentially the LLM might even be able to do these encodings without a script.

Besides the regular network-level blocking, and some simple regex to catch most properly formatted API keys and other credentials, is this worth protecting against? Considering also the more complex the exfiltration patterns to filter for, the higher the amount of false positives.

discuss

No comments yet.