"IBM Bob is IBM’s new coding agent, currently in Closed Beta. "
Promptarmor did a similar attack(1) on Google's Antigravity that is also a beta version. Since then, they added secure mode(2).
These are still beta tools. When the tools are ready, I'd argue that they will probably be safer out of the box compared to a whole lot of users that just blindly copy-paste stuff from the internet, adding random dependencies without proper due diligence, etc. These tools might actually help users acting more secure.
I'm honestly more worried about all the other problems these tools create. Vibe coded problems scale fast. And businesses have still not understood that code is not an asset, it's a liability. Ideally, you solve your business problems with zero lines of code. Code is not expensive to write, it's expensive to maintain.
While they have found some solvable issues (e.g. "the defense system fails to identify separate sub-commands when they are chained using a redirect operator"), the main issue is unsolvable. If you allow an LLM to edit your code and also give it access to untrusted data (like the Internet), you have a security problem.
I have an issue with the "code is a liability" framing. Complexity and lack of maintainability are the ultimate liabilities behind it. Code is often the least worst alternative for solving a given problem compare to unstructured data in spreadsheets, no-code tools without a version history, webs of Zapier hooks, opaque business processes that are different for every office, or whatever other alternatives exist.
It's a good message for software engineers, who have the context to understand when to take on that liability anyway, but it can lead other job functions into being too trigger-happy on solutions that cause all the same problems with none of the mitigating factors of code.
> When the tools are ready, I'd argue that they will probably be safer out of the box compared to a whole lot of users that just blindly copy-paste stuff from the internet, adding random dependencies without proper due diligence, etc. These tools might actually help users acting more secure.
This speculative statement is holding way too much of the argument that they are just “beta tools”.
These prompt injection vulnerabilities give me the heebie jeebies. LLMs feel so non deterministic that it appears to me to be really hard to guard against. Can someone with experience in the area tell me if I'm off base?
> it appears to me to be really hard to guard against
I don't want to sound glib, but one could simply not let an LLM execute arbitrary code without reviewing it first, or only let it execute code inside an isolated environment designed to run untrusted code
the idea of letting an LLM execute code it's dreamt up, with no oversight, in an environment you care about, is absolutely bananas to me
LLMs are vulnerable in the same way humans are vulnerable. We found a way to automate PEBKAC.
I expect that agent LLMs are going to get more and more hardened against prompt injection attacks, but it's hard to get the chance of them working all the way down to zero while still having a useful LLM. So the "solution" is to limit AI privileges and avoid the "lethal trifecta".
Determinism is one thing, but the more pressing thing is permission boundaries. All these AI agent tools need to come with no permissions at all out of the box, and everything should be granularly granted. But that would break all the cool demos and marketing pitches.
Allowing agent to run wild with any arbitrary shell commands is just plain stupid. This should never happen to begin with.
The problem isn't non-determinism per se, an agent that reliably obeys a prompt injection in a README file is behaving entirely deterministically: its behavior is totally determined by the inputs.
You're correct, but the answer is that - typically - they don't access untrusted content all that often.
The number of scenarios in which you have your coding agent retrieving random websites from the internet is very low.
What typically happens is that they use a provider's "web search" API if they need external content, which already pre-processes and summarises all content, so these types of attacks are impossible.
Don't forget: this attack relies on injecting a malicious prompt into a project's README.md that you're actively working on.
At least the malware does already run on the coders machine. Fun starts, when malware just start to run on users machine and the coders are not coders anymore, just prompters and have no idea how such a thing can happen.
If someone can write instructions to download a malicious script into an codebase, hoping an AI agent will read and follow them, they could just as easily write the same wget command directly into a build script or the source itself (probably more effective). In that way it's a very similar threat to the supply chain attacks we're hopefully already familiar with. So it is a serious issue but not necessarily one we don't know how to deal with. The solutions (auditing all third party code, isolating dev environments) just happen to be hard in practice.
Just to be the pedant here, LLMs are fully deterministic (the same LLM, in the same state, with the same inputs, will deliver the same output, and you can totally verify that by running a LLM locally). It's just that they are chaotic (a prompt and a second with slight and seemingly minor changes can produce not just different but conflictual outputs).
You are very on base. In fact, there is a deep conflict that needs to be solved: the non-determinism is the feature of an agent. Something that can "think" for itself and act. If you force agents to be deterministic, don't you just have a slow workflow at that point?
I'm not saying IBM shouldn't try, but really – why is IBM building coding CLIs? They're like the company version of the Steve Buscemi "How do you do, fellow kids?" meme.
Part of the problem here is all the vendor lock in with the tools. It's a new category so it's to be expected, but currently any company that sells an enterprise cloud platform kind of needs their own AI coding tool suite to be competitive.
Everyone is building one these days. None of them really have any differentiating features other than the LLMs they use, but I guess it's a cheap way to try and block off some market share from your competitors.
I saw an IBM presentation about AI at a conference years ago, during the previous wave of AI hype (2018-ish). IIRC they were advertising some specialized AI chip/hardware. The presentation was kind of meh, but it shows they've been trying to dab in this space for a while.
I'd rather view it as a failure to distinguish between data and logic. The status-quo is several steps short of where it needs to be before we can productively start talking about types and completeness.
Unfortunately that, er, opportunistic shortcut is an essential behavior of modern LLMs, and everybody keeps building around it hoping the root problem will be fixed by some silver-bullet further down the line.
> Bob has three defenses that are bypassed in this attack
This section describes the bypass in three steps, but only actually describes two defenses and uses the third bullet point as a summary of how the two bypasses interact.
You can probably get any coding agent with this if you put these instructions in the README/CLAUDE.md/AGENTS.md or whatever of your repo.
It's unclear to me if Bob is working as intended or how we should classify these types of bugs. Threat modeling this sort of prompt injection gets murky, but in general don't put untrusted markdown into your AI agents.
pretty funny that the text shown users when trying run commands with substitution like $() specifically says they block process substitution in commands, but the code just doesnt block it at all
I can’t believe the Bob CLI is just another fork of the Gemini CLI, no wonder Anthropic has the moat in agentic development CLIs, at least they are developing their own.
Maybe I'm paranoid, but allowing any coding agent or tool to execute commands within terminal that is not sandboxed somehow will be prone to attacks like that
It's a double edged sword. With terminal sure, but not allowing interaction in Microsoft applications like Power BI (especially with no ability to copy and paste) renders Copilot completely useless.
Isn’t the problem that it’s supposed to not execute commands without strict approval but the shell stdout redirection in combination with process substitution is bypassing this.
> In the documentation, IBM warns that setting auto-approve for commands constitutes a 'high risk' that can 'potentially execute harmful operations' - with the recommendation that users leverage whitelists and avoid wildcards
Users have been trained to do this, as shifting the burden to the user with no way to enforce bounds or even sensible defaults.
E.G. I can guarantee that people will whitelist bwrap, crun, docker, expecting to gain advantage from isolation, while the caller can override all of those protections with arguments.
The reality is that we have trained the public to allow local code execution on their devices to save a few cents on a hamburger, we can’t have it both ways.
Unless you are going to teach everyone that they need to make sure address family 40, openat2(), etc.. are unsafe, users have no way to win right now.
The use case has to either explicitly harden or shift blame.
With Opendesktop, OCI, systemd, and kernel all making locally optimal decisions, the reality is that ephemeral VMs is the only ‘safe’ way to run untrusted code today.
Sandboxes can be better but containers on a workstation (without a machine VM) are purely theatre.
Think about this for a second. So we're telling me that IBM just created an AI assistant that's basically been trained to run malware if you tell it nicely? That's wild, man. That's actually insane.
Like, we're at this point now where we're building these superintelligent systems but we can't even figure out how to keep them from getting pranked by a README file? A README FILE, bro. That's like... that's like building a robot bodyguard but forgetting to tell it the difference between a real gun and a fake gun.
And here's the crazy part - the article says users just have to not click "always allow." But dude, have you MET users? Come on. That's like telling someone not to eat the Tide Pod. You're fighting human nature here.
I'm telling you, five years from now we're gonna have some kid write a poem about cybersecurity in their GitHub repo and accidentally crash the entire Stock Exchange. Mark my words. This is the most insane timeline.
Thought the product looks good for a prototype, but crazy as a published product.
Then found out it's a closed beta.
So ... ok? Closed beta test is doing what such a test is supposed to do. Sure, ideally the issue would have been figured out earlier, especially if this is a design issue and the parsing needs to be thought out again, but this is still reasonably inside the layers of redundancy for catching these kinds of things amicably.
I'm surprised there's no mention about disclosing the bug to IBM?. Usually these kinds of disclosures have a timeline showing when they told the vendor about the bug and when it was fixed. Now it looks like they just randomly released the vulnerability info on their blog.
Also a bit annoyed there's no date on the article, but looking at the HTML source it seems it was released today (isn't it annoying when blog software doesn't show the publish date?).
OakNinja|1 month ago
Promptarmor did a similar attack(1) on Google's Antigravity that is also a beta version. Since then, they added secure mode(2).
These are still beta tools. When the tools are ready, I'd argue that they will probably be safer out of the box compared to a whole lot of users that just blindly copy-paste stuff from the internet, adding random dependencies without proper due diligence, etc. These tools might actually help users acting more secure.
I'm honestly more worried about all the other problems these tools create. Vibe coded problems scale fast. And businesses have still not understood that code is not an asset, it's a liability. Ideally, you solve your business problems with zero lines of code. Code is not expensive to write, it's expensive to maintain.
(1) https://www.promptarmor.com/resources/google-antigravity-exf... (2) https://antigravity.google/docs/secure-mode
InsideOutSanta|1 month ago
strken|1 month ago
It's a good message for software engineers, who have the context to understand when to take on that liability anyway, but it can lead other job functions into being too trigger-happy on solutions that cause all the same problems with none of the mitigating factors of code.
Eufrat|1 month ago
This speculative statement is holding way too much of the argument that they are just “beta tools”.
cyanydeez|1 month ago
They cant. Why? Because the smartest bear ia smarter than the dumbest human.
So, these AIs are suppose to interface with humans and use nondeterminant language.
That vector will always be exploitable, unless youre talking about AI that no han controls.
ronbenton|1 month ago
throwmeaway820|1 month ago
I don't want to sound glib, but one could simply not let an LLM execute arbitrary code without reviewing it first, or only let it execute code inside an isolated environment designed to run untrusted code
the idea of letting an LLM execute code it's dreamt up, with no oversight, in an environment you care about, is absolutely bananas to me
ACCount37|1 month ago
I expect that agent LLMs are going to get more and more hardened against prompt injection attacks, but it's hard to get the chance of them working all the way down to zero while still having a useful LLM. So the "solution" is to limit AI privileges and avoid the "lethal trifecta".
mystifyingpoi|1 month ago
Allowing agent to run wild with any arbitrary shell commands is just plain stupid. This should never happen to begin with.
roywiggins|1 month ago
stingraycharles|1 month ago
The number of scenarios in which you have your coding agent retrieving random websites from the internet is very low.
What typically happens is that they use a provider's "web search" API if they need external content, which already pre-processes and summarises all content, so these types of attacks are impossible.
Don't forget: this attack relies on injecting a malicious prompt into a project's README.md that you're actively working on.
anonymars|1 month ago
inetknght|1 month ago
Nope, not at all. Non-determinism is what most software developers write. Something to do with profitability and time or something.
api|1 month ago
Probably good advice for lots of things these days given supply chain attacks targeting build scripts, git, etc.
_trampeltier|1 month ago
resfirestar|1 month ago
ezst|1 month ago
fenwick67|1 month ago
ymyms|1 month ago
prodigycorp|1 month ago
resfirestar|1 month ago
TZubiri|1 month ago
paxys|1 month ago
wpasc|1 month ago
cedws|1 month ago
rdtsc|1 month ago
hu3|1 month ago
AI sells.
ronbenton|1 month ago
blauditore|1 month ago
gram-hours|1 month ago
omneity|1 month ago
0: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
Terr_|1 month ago
Unfortunately that, er, opportunistic shortcut is an essential behavior of modern LLMs, and everybody keeps building around it hoping the root problem will be fixed by some silver-bullet further down the line.
zahlman|1 month ago
This section describes the bypass in three steps, but only actually describes two defenses and uses the third bullet point as a summary of how the two bypasses interact.
samtp|1 month ago
33a|1 month ago
It's unclear to me if Bob is working as intended or how we should classify these types of bugs. Threat modeling this sort of prompt injection gets murky, but in general don't put untrusted markdown into your AI agents.
hackerBanana|1 month ago
walrus01|1 month ago
monista|1 month ago
ronbenton|1 month ago
falloutx|1 month ago
efficax|1 month ago
cons0le|1 month ago
GaryBluto|1 month ago
rmonvfer|1 month ago
rpodraza|1 month ago
internet101010|1 month ago
hultner|1 month ago
edf13|1 month ago
“if the user configures ‘always allow’ for any command”
nyrikki|1 month ago
Users have been trained to do this, as shifting the burden to the user with no way to enforce bounds or even sensible defaults.
E.G. I can guarantee that people will whitelist bwrap, crun, docker, expecting to gain advantage from isolation, while the caller can override all of those protections with arguments.
The reality is that we have trained the public to allow local code execution on their devices to save a few cents on a hamburger, we can’t have it both ways.
Unless you are going to teach everyone that they need to make sure address family 40, openat2(), etc.. are unsafe, users have no way to win right now.
The use case has to either explicitly harden or shift blame.
With Opendesktop, OCI, systemd, and kernel all making locally optimal decisions, the reality is that ephemeral VMs is the only ‘safe’ way to run untrusted code today.
Sandboxes can be better but containers on a workstation (without a machine VM) are purely theatre.
promiseofbeans|1 month ago
orliesaurus|1 month ago
Like, we're at this point now where we're building these superintelligent systems but we can't even figure out how to keep them from getting pranked by a README file? A README FILE, bro. That's like... that's like building a robot bodyguard but forgetting to tell it the difference between a real gun and a fake gun.
And here's the crazy part - the article says users just have to not click "always allow." But dude, have you MET users? Come on. That's like telling someone not to eat the Tide Pod. You're fighting human nature here.
I'm telling you, five years from now we're gonna have some kid write a poem about cybersecurity in their GitHub repo and accidentally crash the entire Stock Exchange. Mark my words. This is the most insane timeline.
philipallstar|1 month ago
maxlin|1 month ago
Then found out it's a closed beta.
So ... ok? Closed beta test is doing what such a test is supposed to do. Sure, ideally the issue would have been figured out earlier, especially if this is a design issue and the parsing needs to be thought out again, but this is still reasonably inside the layers of redundancy for catching these kinds of things amicably.
schmuckonwheels|1 month ago
We have automated the task of developers blindly executing
They would have happily pasted it into the terminal without the automation.It's a net win for everyone involved.
Malware writers and their targets alike, who, eager to install the latest fad library or framework would have voluntarily installed it anyway.
philipallstar|1 month ago
krackers|1 month ago
tmsbrg|1 month ago
Also a bit annoyed there's no date on the article, but looking at the HTML source it seems it was released today (isn't it annoying when blog software doesn't show the publish date?).
francisofascii|1 month ago
unknown|1 month ago
[deleted]
forshaper|1 month ago
Mouvelie|1 month ago
lxe|1 month ago
Imagine if we had something like:
That would be ridiculous, right? The right headline is:IBMsux|1 month ago
kingjimmy|1 month ago
Candelacristina|1 month ago
[deleted]