top | item 46903387

(no title)

_tk_ | 25 days ago

The system card unfortunately only refers to this [0] blog post and doesn't go into any more detail. In the blog post Anthropic researchers claim: "So far, we've found and validated more than 500 high-severity vulnerabilities".

The three examples given include two Buffer Overflows which could very well be cherrypicked. It's hard to evaluate if these vulns are actually "hard to find". I'd be interested to see the full list of CVEs and CVSS ratings to actually get an idea how good these findings are.

Given the bogus claims [1] around GenAI and security, we should be very skeptical around these news.

[0] https://red.anthropic.com/2026/zero-days/

[1] https://doublepulsar.com/cyberslop-meet-the-new-threat-actor...

discuss

order

tptacek|24 days ago

I know some of the people involved here, and the general chatter around LLM-guided vulnerability discovery, and I am not at all skeptical about this.

malfist|24 days ago

[flagged]

majormajor|24 days ago

The Ghostscript one is interesting in terms of specific-vs-general effectiveness:

---

> Claude initially went down several dead ends when searching for a vulnerability—both attempting to fuzz the code, and, after this failed, attempting manual analysis. Neither of these methods yielded any significant findings.

...

> "The commit shows it's adding stack bounds checking - this suggests there was a vulnerability before this check was added. … If this commit adds bounds checking, then the code before this commit was vulnerable … So to trigger the vulnerability, I would need to test against a version of the code before this fix was applied."

...

> "Let me check if maybe the checks are incomplete or there's another code path. Let me look at the other caller in gdevpsfx.c … Aha! This is very interesting! In gdevpsfx.c, the call to gs_type1_blend at line 292 does NOT have the bounds checking that was added in gstype1.c."

---

It's attempt to analyze the code failed but when it saw a concrete example of "in the history, someone added bounds checking" it did a "I wonder if they did it everywhere else for this func call" pass.

So after it considered that function based on the commit history it found something that it didn't find from its initial fuzzing and code-analysis open-ended search.

As someone who still reads the code that Claude writes, this sort of "big picture miss, small picture excellence" is not very surprising or new. It's interesting to think about what it would take to do that precise digging across a whole codebase; especially if it needs some sort of modularization/summarization of context vs trying to digest tens of million lines at once.

nextaccountic|23 days ago

It doesn't matter if it's hard to find, if humans weren't finding it for whatever reason (little interest, no funding etc) and now AI can find them

AI is relentless

yencabulator|21 days ago

I used Claude Code to debug a weird interaction in a NixOS config. Ever since, I'm more a believer in Artificial General Patience than Artificial General Intelligence.

aaaalone|24 days ago

See it as a signal under many and not as some face value.

After all they need time to fix the cves.

And it doesn't matter to you as long as your investment into this is just 20 or 100 bucks per month anyway.

AlienRobot|24 days ago

Hard to find or not, they found it.

SoftTalker|24 days ago

Finally the promise of "with enough eyes, all bugs are shallow" may come true?

scotty79|24 days ago

> It's hard to evaluate if these vulns are actually "hard to find".

Can we stop doing that?

I know it's not the same but it sounds like "We don't know if that job that the woman supposedly successfully finished was all that hard." implying that if a woman did something, it surely must have been easy.

If you know it's easy, say that it was easy and why. Don't use your lack of knowledge or competence to create empty critique founded solely on doubt.

fc417fc802|24 days ago

What if the woman in question happens to have a history of hamming up her accomplishments?

Given the context I'd say it's reasonable to question the value of the output. It falls to the other party to demonstrate that this is anything more than the usual slop.

bmitc|24 days ago

It isn't clear what you're arguing.