It's a neat idea but it's not exactly plausible real world conditions to have an agent that pretty much exclusively spends its time wading through an email inbox that's 99% repeated prompt injection attempts. As the creator acknowledges in the original thread, its context/working memory is going to be unusually cognizant of prompt injection risk at any given time vs. a more typical helpful agent "mindset" while fulfilling normal day-to-day requests. Where a malicious prompt might be slipped in via any one of dozens of different infiltration points without the convenience of a static "prompt injection inbox".
No comments yet.