top | item 46932281

(no title)

mcintyre1994 | 21 days ago

I don’t think there’s any solution to what SimonW calls the lethal trifecta with it, so I’d say that’s still pretty impossible.

I saw on The Verve that they partnered with the company that repeatedly disclosed security vulnerabilities to try to make skills more secure though which is interesting: https://openclaw.ai/blog/virustotal-partnership

I’m guessing most of that malware was really obvious, people just weren’t looking, so it’s probably found a lot. But I also suspect it’s essentially impossible to actually reliably find malware in LLM skills by using an LLM.

discuss

veganmosfet|21 days ago

Regarding prompt injection: it's possible to reduce the risk dramatically by: 1. Using opus4.6 or gpt5.2 (frontier models, better safety). These models are paranoid. 2. Restrict downstream tool usage and permissions for each agentic use case (programmatically, not as LLM instructions). 3. Avoid adding untrusted content in "user" or "system" channels - only use "tool". Adding tags like "Warning: Untrusted content" can help a bit, but remember command injection techniques ;-) 4. Harden the system according to state of the art security. 5. Test with red teaming mindset.

sathish316|21 days ago

Anyone who thinks they can avoid LLM Prompt injection attacks should be asked to use their email and bank accounts with AI browsers like Comet.

A Reddit post with white invisible text can hijack your agent to do what an attacker wants. Even a decade or 2 back, SQL injection attacks used to require a lot of proficiency on the attacker and prevention strategies from a backend engineer. Compare that with the weak security of so called AI agents that can be hijacked with random white text on an email or pdf or reddit comment

habinero|21 days ago

> Adding tags like "Warning: Untrusted content" can help

It cannot. This is the security equivalent of telling it to not make mistakes.

> Restrict downstream tool usage and permissions for each agentic use case

Reasonable, but you have to actually do this and not screw it up.

> Harden the system according to state of the art security

"Draw the rest of the owl"

You're better off treating the system as fundamentally unsecurable, because it is. The only real solution is to never give it untrusted data or access to anything you care about. Which yes, makes it pretty useless.

madeofpalk|21 days ago

Honestly, 'malware' is just the beginning it's combining prompt injection with access to sensitive systems and write access to 'the internet' is the part that scares me about this.

I never want to be one wayward email away from an AI tool dumping my company's entire slack history into a public github issue.