(no title)
ryanrasti | 13 days ago
The answer is to constrain effects, not intent. You can define capabilities where agent behavior is constrained within reasonable limits (e.g., can't post private email to #general on Slack without consent).
The next layer is UX/feedback: can compile additional policy based as user requests it (e.g., only this specific sender's emails can be sent to #general)
botusaurus|13 days ago
decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful
ryanrasti|13 days ago
Yeah, you're hitting on the core tradeoff between correctness and usefulness.
The key differences here: 1. We're not tracking at byte-level but at the tool-call/capability level (e.g., read emails) and enforcing at egress (e.g., send emails) 2. Agent can slowly learn approved patterns from user behavior/common exceptions to strict policy. You can be strict at the start and give more autonomy for known-safe flows over time.
gostsamo|13 days ago
exfiltrating info through get requests won't be 100% stopped, but will be hampered.
ATechGuy|13 days ago
zmmmmm|13 days ago
> you're hitting on the core tradeoff between correctness and usefulness
The question is, is it a completely unsupervised bot or is a human in the loop. I kind of hope a human is not in the loop with it being such a caricature of LLM writing.