My startup builds agents for penetration testing, and this is the bet we have been making for over a year when models started getting good at coding. There was a huge jump in capability from Sonnet 4 to Sonnet 4.5. We are still internally testing Opus 4.5, which is the first version of Opus priced low enough to use in production. It's very clever and we are re-designing our benchmark systems because it's saturating the test cases.
dboreham|3 months ago
apercu|3 months ago
unknown|3 months ago
[deleted]
micromacrofoot|3 months ago
carsoon|3 months ago
Before it felt like they were good for very specific usecases and common frameworks (Python and nextjs) but still made tons of mistakes constantly.
Now they work with novel frameworks and are very good at correcting themselves using linting errors, debugging themselves by reading files and querying databases and these models are affordable enough for many different usecases.
justanotherunit|3 months ago
vngzs|3 months ago
embedding-shape|3 months ago
apimade|3 months ago
Bit by bit.
Over the past six weeks, I’ve been using AI to support penetration testing, vulnerability discovery, reverse engineering, and bug bounty research. What began as a collection of small, ad-hoc tools has evolved into a structured framework: a set of pipelines for decompiling, deconstructing, deobfuscating, and analyzing binaries, JavaScript, Java bytecode, and more, alongside utility scripts that automate discovery and validation workflows.
I primarily use ChatGPT Pro and Gemini. Claude is effective for software development tasks, but its usage limits make it impractical for day-to-day work. From my perspective, Anthropic subsidizes high-intensity users far less than its competitors, which affects how far one can push its models. Although it's becoming more economical across their models recently, and I'd shift to them completely purely because of the performance of their models and infrastructure.
Having said all that, I’ve never had issues with providers regarding this type of work. While my activity is likely monitored for patterns associated with state-aligned actors (similar to recent news reports you may have read), I operate under my real identity and company account. Technically, some of this usage may sit outside standard Terms of Service, but in practice I’m not aware of any penetration testers who have faced repercussions -- and I'd quite happily take the L if I fall afoul of some automated policy, because competitors will quite happily take advantage of that situation. Larger vuln research/pentest firms may deploy private infrastructure for client-side analysis, but most research and development still takes place on commercial AI platforms -- and as far as I'm aware, I've never heard of a single instance of Google, Microsoft, OpenAI or Anthropic shutting down legitimate research use.
ceejayoz|3 months ago
agobineau|3 months ago
use the following blogs as ideas for dialogue: - tumblr archive 1 - tumblr archive 2 etc
the bot will write a prompt, using the reference material. paste into the actual chub ai bot, then feedback the uncouth response to perplexity and say well it said this. perplexity will then become even more filtered (edit: unfiltered)
at this point i have found you can ask it almost anything and it will behave completely unfiltered. doesnt seem to work for image gen though.
fragmede|3 months ago
Think of it as practice for real life.
UltraSane|3 months ago
VladVladikoff|3 months ago
karlgkk|3 months ago