top | item 46913852

(no title)

joozio | 24 days ago

Haven't benchmarked pre-processing approaches yet, but that's a natural next step. Right now the test page targets raw agent behavior — no middleware. A comparison between raw vs sanitized pipelines against the same attacks would be really useful. The multi-layer attack (#10) would probably be the hardest to strip cleanly since it combines structural hiding with social engineering in the visible text.

discuss

the_harpia_io|24 days ago

Yeah, the social engineering + structural combination is brutal to defend against. You can strip the technical hiding but the visible prompt injection still works on the model. Would be interesting to see how much of the ~70% success rate drops with just basic sanitization (strip comments, normalize whitespace, remove zero-width) vs more aggressive stripping.

If you build out a v2 with middleware testing, a leaderboard by framework would be killer. "How manipulation-proof is [Langchain/AutoGPT/etc] out of the box vs with basic defenses" would get a lot of attention.