(no title)
rdli | 1 month ago
In general, treating LLM outputs (no matter where) as untrusted, and ensuring classic cybersecurity guardrails (sandboxing, data permissioning, logging) is the current SOTA on mitigation. It'll be interesting to see how approaches evolve as we figure out more.
Barrin92|1 month ago
[...]It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable,[...]
If only we had a way to tell a computer precisely what we want it to do...
https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
kahnclusions|1 month ago
vmg12|1 month ago
solid_fuel|1 month ago
That's not sufficient. If a user copies customer data into a public google sheet, I can reprimand and otherwise restrict the user. An LLM cannot be held accountable, and cannot learn from mistakes.
rdli|1 month ago
solid_fuel|1 month ago
The _only_ way to create a reasonably secure system that incorporates an LLM is to treat the LLM output as completely untrustworthy in all situations. All interactions must be validated against a security layer and any calls out of the system must be seen as potential data leaks - including web searches, GET requests, emails, anything.
You can still do useful things under that restriction but a lot of LLM tooling doesn't seem to grasp the fundamental security issues at play.
jcims|1 month ago
1 - https://alignment.anthropic.com/2025/subliminal-learning/
zbentley|1 month ago
We already have another actor in the threat model that behaves equivalently as far as determinism/threat risk is concerned: human users.
Issue is, a lot of LLM security work assumes they function like programs. They don’t. They function like humans, but run where programs run.