top | item 47056759

Show HN: AI pentester – verified exploits, $999/assessment

3 points| gauravbsinghal | 12 days ago

I spent 20 years in security, most recently leading 100+ engineers at AWS building pentesting infrastructure across thousands of services. The same problem everywhere: pentests take weeks, cost $15-50k, and the results are stale before they ship.

I built Cipher to fix that. It's an AI agent that reasons like an attacker — maps the target, finds vulnerabilities, chains them into exploits, and proves they're real. Every finding ships with a reproducible Python script. If the script doesn't break your system, we don't report it.

How it works: Cipher defines security invariants ("User A can't access User B's data"), then multiple agents attack in parallel to violate them. A separate judge agent tries to disprove every finding — if it can't reproduce the exploit 3 times, the finding dies. You never see it.

$999 per assessment. Results in ~2 hours. Unlimited retesting.

Honest limitations: complex multi-step auth flows (SSO with MFA) still need manual setup like providing JWT credentials. We're working on it.

I'll run Cipher free for the first 15 HN readers who want to try it. Drop your email or sign up at https://apxlabs.ai/. Happy to answer any questions about the approach.

3 comments

tonetegeatinst|12 days ago

Are you able to share what models or fine tuning you did for the agents?

I'm currently studying security in college, and most of my time is spent working on a good system card and premade prompts for certain situations like using nmap or burpsuite.

gauravbsinghal|12 days ago

Great question. We use frontier models (Claude, Gemini class) without fine-tuning. The insight that changed everything for us: prompt engineering alone hits a ceiling fast for offensive security.

What matters more than the model:

1. Architecture over prompts. Cipher isn't one agent with a great prompt — it's multiple agents with distinct roles (recon, attack, verification) that coordinate. The "judge" agent that tries to disprove findings is more important than the attacker agent. 2. Tool use over reasoning. The model doesn't "know" how to pentest — it reasons about what tool to use next based on what it's learned so far. We give it real tools (not simulated ones) and let it chain them. 3. Invariant-based testing over checklist-based. Instead of "try SQLi on every input," Cipher defines security properties ("User A can't access User B's data") and tries to violate them. This catches logic bugs that no scanner finds.

Since you're studying security — the best thing you can do is get really good at manual pentesting first. Understanding why an attack chain works is what lets you build agents that reason about it. The prompts matter less than the mental model you encode into the system's architecture.

Happy to chat more — feel free to DM or join our Discord.

gus_massa|10 days ago

Is there a free/cheap version that runs a minimal set of test?