top | item 47065377

(no title)

root_axis | 11 days ago

Presumably the models would at the very least need major fine tuning on this standard to prevent it from being mitigated through prompt injection.

discuss

alexgarden|11 days ago

Actually, not really... proofing against prompt injection (malicious and "well intentioned") was part of my goal here.

What makes AAP/AIP so powerful is that prompt injection would succeed in causing the agent to attempt to do wrong, and then AIP would intervene with a [BOUNDARY VIOLATION] reminder in real-time. Next thinking block.

As I said earlier, not a guarantee, but so far, in my experience, pretty damn robust. The only thing that would make it more secure (than real-time thinking block monitoring) would be integration inside the LLM provider's process, but that would be a nightmare to integrate and proprietary unless they could all agree on a standard that didn't compromise one of them. Seems improbable.