top | item 46709440

(no title)

manwe150 | 1 month ago

Having seen the steps an LLM agent already will take to workaround any instructed limitations, I wouldn't be surprised if a malicious actor didn't even have to ask for that, and the code agent would just do that ROT-13 itself when it detects that the initial plain text exfiltration failed.

discuss

No comments yet.