jfaganel99's comments

jfaganel99 | 13 days ago | on: Vulnerabilities in 45 Open Source Projects (vLLM, Langfuse, Phase, NocoDB)

That's a great question. This is how I would think about it:

The number of vulnerabilities by itself doesn't mean much. It has more to do with the size of the codebase and the attack surface than with the quality of the code. There is a big difference between 10 findings in 500 lines and 10 findings in 500k lines.

What matters more:

1. How bad it is and how easy it is to use. An auth bypass is not the same as a timing attack in theory. Check to see if the vulnerabilities are in code paths that can be reached in your deployment.

2. The strongest signal is the maintainer's response. How quickly do they reply? Do they take the results seriously or ignore them? A project that fixes problems quickly and gets people involved in a good way is much better than one that has no findings and no security process. For LangFuse specifically, they agreed with two of the findings and said that two of them were acceptable risks. This is a reasonable response. It's worth following up on the V4 non-response, but maintainers are busy and things get missed.

3. The kind of bugs is important. It's normal for any codebase to have logic errors, like the ones we found. You don't want to see the same type of vulnerability happen over and over again, because that means there is a systemic gap.

The only reason a project shows up in our results is because it's popular enough for us to look at it. I'd be more worried about projects that have never been looked at for security.

If you want to know more about LangFuse specifically, you can find all the information on the site: https://www.kolega.dev/security-wins/

jfaganel99 | 13 days ago | on: Show HN: Skill or Kill – Can you spot the malicious AI agent skill?

Hi HN - side project.

After reading about the ClawHavoc campaign and seeing how fast malicious skills were spreading on ClawHub (1,100+ at last count), I figured it would be useful to have something where people can actually practice telling the difference between a legit skill and a bad one.

The game gives you realistic skill snippets. Some are safe, some are modeled on real attack patterns - fake driver installs, hidden bash execution, credential pass-through to the LLM context window. You classify each one under time pressure and get feedback on what you missed and why.

5 rounds, runs in the browser, no signup.

Happy to talk about the attack patterns or how I put the scenarios together.

jfaganel99 | 27 days ago | on: Vulnerabilities in 45 Open Source Projects (vLLM, Langfuse, Phase, NocoDB)

Author here. We built a security scanner called Kolega that does semantic analysis instead of pattern matching. To see if it actually worked, we ran it against 45 open source projects and reported what it found through responsible disclosure.

225 vulnerabilities. 41 reviewed by maintainers so far, 37 accepted, 4 rejected. 90% acceptance rate.

The bugs weren't exotic. They were things like:

if not user_id is not None - a double negative in Phase that means the permission check never runs. Nine auth bypasses total.

torch.load() without weights_only=True in vLLM - RCE via pickle deserialization in one of the most popular inference frameworks.

RestrictedPython sandbox in Agenta where __import__ was explicitly added to safe_builtins. Four different escape routes to arbitrary code execution.

SQL injection in NocoDB's Oracle client - Semgrep scanned the same codebase and found 222 issues, 208 of which were false positives, and missed this one entirely.

The interesting part to me wasn't that we found bugs. It's that these are all syntactically correct - the code compiles, runs, looks fine in review. The problems are semantic. No pattern matcher catches not X is not None because it's valid Python. You have to understand what the developer intended.

Every finding is published with full details - code locations, CWEs, PR numbers, disclosure timelines: https://www.kolega.dev/security-wins/

135 findings are still waiting on maintainer response. 4 were rejected - some we thought were exploitable, maintainers disagreed. We document those too.

Happy to discuss specifics on any of the projects or argue about methodology.

page 1