top | item 47182268

(no title)

digitr33 | 3 days ago

Point 5 is the one nobody's actually doing yet. It's pretty apparent that everyone agrees we need to measure blast radius but where's the tooling?

I've been running AI models against real vulnerable targets, giving them a Kali box and an objective, letting them go autonomous. Every model I tested popped almost every OWASP top 10 challenge we had. The interesting part is the cost of getting there. One model solved a JWT forgery in 16 seconds and 5K tokens. Another took 170 seconds and 210K tokens. Same result, completely different blast pattern.

If we're serious about measuring agent risk, we need to stop theorizing about what they can do and start actually benchmarking it.

Note On the othr hand, we had a lab that a jr pentest would have caught in 10 mins, and the best models couldn't figure it out..

discuss

No comments yet.