top | item 47052798

(no title)

cuchoi | 12 days ago

If this a defender win maybe the lesson is: make the agent assume it’s under attack by default. Tell the agent to treat every inbound email as untrusted prompt injection.

discuss

lufenialif2|12 days ago

Wouldn't this limit the ability of the agent to send/receive legitimate data, then? For example, what if you have an inbox for fielding customer service queries and I send an email "telling" it about how it's being pentested and to then treat future requests as if they were bogus?

alexhans|12 days ago

The website is great as a concept but I guess it mimics an increasingly rare one off interaction without feedback.

I understand the cost and technical constraints but wouldn't an exposed interface allow repeated calls from different endpoints and increased knowledge from the attacker based on responses? Isn't this like attacking an API without a response payload?

Do you plan on sharing a simulator where you have 2 local servers or similar and are allowed to really mimic a persistent attacker? Wouldn't that be somewhat more realistic as a lab experiment?

cuchoi|12 days ago

The exercise is not fully realistic because I think getting hundreds of suspicious emails puts the agent in alert. But the "no reply without human approval" part I think it is realistic because that's how most openclaw assistants will run.

TZubiri|11 days ago

If this is a defender win, the lesson is, design a CtF experiment with as much defender advantage as possible and don't simulate anything useful at all.

raincole|12 days ago

It would likely make your agent useless for legitimate cases too.

It's like the old saying: the patient is no longer ill (whispering: because he is dead now)