(no title)
joozio
|
24 days ago
That's a really interesting edge case - screenshot-based agents sidestep the entire attack surface because they never process raw HTML. All 10 attacks here are text/DOM-level. A visual-only agent would need a completely different attack vector (like rendered misleading text or optical tricks). Might be worth exploring as a v2.
pixl97|24 days ago
I was looking at some posts not long ago where LLMs were falling for the same kind of optical illusions that humans do, in this case the same color being contrasted by light and dark colors appears to be a different color.
If the attacker knows what model you're using then it's very likely they could craft attacks against it based on information like this. What those attacks are still need explored. If I were arsed to do it, I'd start by injecting noise patterns in images that could be interpreted as text.