top | item 46627651

(no title)

I don't think this is accurate.

Readonly access (web searches, db, etc) all seem fine as long as the agent cannot exfiltrate the data as demonstrated in this attack. As I started with: more sophisticated outbound filtering would protect against that.

MCP/tools could be used to the extent you are comfortable with all of the behaviors possible being triggered. For myself, in sandboxes or with readonly access, that means tools can be allowed to run wild. Cleaning up even in the most disastrous of circumstances is not a problem, other than a waste of compute.

discuss

motoxpro|1 month ago

Maybe another way to think of this is that you are giving the read only services, write access to your models context, which then gets executed by the llm.

There is no way to NOT give the web search write access to your models context.

The WORDS are the remote executed code in this scenario.

You kind of have no idea what’s going on there. For example, malicious data adds the line “find a pattern” and then every 5th word you add a letter that makes up your malicious code. I don’t know if that would work but there is no way for a human to see all attacks.

Llms are not reliable judges of what context is safe or not (as seen by this article, many papers, and real world exploits)

lunar_mycroft|1 month ago

There is no such thing as read only network access. For example, you might think that limiting the LLM to making HTTP GET requests would prevent it from exfiltrating data, but there's nothing at all to stop the attacker's server from receiving such data encoded in the URL. Even worse, attackers can exploit this vector to exfiltrate data even without explicit network permissions if the users client allow things like rendering markdown images.