top | item 47158365 (no title) stared | 5 days ago Rerun it for "high" and "xhigh" effort settings, and GPT-5.2-Codex still get 0% false positive, while getting at the level of other best models for localization of backdoors: https://quesma.com/benchmarks/binaryaudit/ discuss order hn newest No comments yet.
No comments yet.