(no title)
pbronez | 19 days ago
Credential scanning seems tractable. There’s a large body of work around scanning for credentials in repos to prevent leakage via GitHub.
If speed really matters, you could minimize the patterns you check by integrating credential management. By definition, you know all the secrets you’re trying to protect. Look for _exactly_ those rather than regex which try to enumerate the general case.
Still, solving credential leakage is necessary but not sufficient. There’s other sensitive information in your context: customer contact information, costs & pricing, snarky slack conversations. That stuff could show up anywhere online your agent can post. Like Google Reviews.
The structural problem is that Enumerate Badness is always incomplete but it’s impossible to Enumerate Goodness for a generative system. The only solution I see is to allowlist resources at the network level and assume 100% cross contamination.
This article helped shape my thinking on this topic:
The Six Dumbest Ideas in Computer Security
https://www.ranum.com/security/computer_security/editorials/...
No comments yet.