top | item 47158365

(no title)

stared | 5 days ago

Rerun it for "high" and "xhigh" effort settings, and GPT-5.2-Codex still get 0% false positive, while getting at the level of other best models for localization of backdoors: https://quesma.com/benchmarks/binaryaudit/

discuss

No comments yet.