(no title)
stonepresto | 9 months ago
But what if there's a missing piece of the puzzle that the author and devs missed or assumed o3 covered, but in fact was out of o3's context, that would invalidate this vulnerability?
I'm not saying there is, nor am I going to take the time to do the author's work for them, rather I am saying this report is not fully validated which feels like a dangerous precedent to set with what will likely be an influential blog post in the LLM VR space moving forward.
IMO the idea of PoC || GTFO should be applied more strictly than ever before to any vulnerability report generated by a model.
The underlying perspective that o3 is much better than previous or other current models still remains, and the methodology is still interesting. I understand the desire and need to get people to focus on something by wording it a specific way, it's the clickbait problem. But dammit, do better. Build a PoC and validate your claims, don't be lazy. If you're going to write a blog post that might influence how vulnerability researchers conduct their research, you should promote validation and not theoretical assumption. The alternative is the proliferation of ignorance through false-but-seemingly-true reporting, versus deepening the community's understanding of a system through vetted and provable reports.
seanheelan|9 months ago
stonepresto|9 months ago
lyu07282|9 months ago
stonepresto|9 months ago
1) If it is actually a UAF or if there is some other mechanism missing from the context that prevents UAF. 2) The category and severity of the vulnerability. Is it even a DoS, RCE, or is the only impact causing a thread to segfault?
This is all part of the standard vulnerability research process. I'm honestly surprised it got merged in without a PoC, although with high profile projects even the suggestion of a vulnerability in code that can clearly be improved will probably end up getting merged.