top | item 44631002

(no title)

An LLM - which has functionally infinite unverifiable attack surface - directly wired into a payment system with high authentication. How could anyone anticipate this going wrong?

I feel like everyone is saying 'we're still discovering what LLMs are good at' but it also feels like we really need to get in our collective conscious what they're really, really, bad at.

discuss

Aurornis|7 months ago

> An LLM - which has functionally infinite unverifiable attack surface - directly wired into a payment system with high authentication. How could anyone anticipate this going wrong?

If you didn’t catch it, this scenario was fabricated for this blog post. The company writing the post sells vulnerability testing tools.

This isn’t what a real production system even looks like. They’re using Claude Desktop. I mean I guess someone who doesn’t know better could connect Stripe and iMessage to Claude Desktop and then give the Stripe integration full permissions. It’s possible. But this post wasn’t an exploit of a real world system they found. They created it and then exploited it as an example. They sell services to supposedly scan for vulnerabilities like this.

rhavaeis|7 months ago

> This isn’t what a real production system even looks like. They’re using Claude Desktop. I mean I guess someone who doesn’t know better could connect Stripe and iMessage to Claude Desktop and then give the Stripe integration full permissions.

The core issue here is not whether or not people will connect stripe and iMessage at the same time or not. The issue is that as long as you connect iMessage, attackers can call any arbitrary tools and do what they want. It could be your Gmail, Calendar, or anything else. This is just showcasing that Claude can not distinguish between fabricated messages and real ones.

btown|7 months ago

Even if this is a fabricated system, there are all sorts of sensitive things that might be made accessible to an LLM that is fed user-generated data.

For instance, say you have an internal read-only system that knows some details about your proprietary vendor relationships. You wire up an LLM with an internal MCP server to "return the ID and title of the most appropriate product for a customer inquiry." All is well until the customer/attacker submits a form containing text that looks like the JSON for MCP back-and-forth traffic, and aims to exfiltrate your data. Sure, all that JSON was escaped, but you're still trusting that the LLM doesn't get confused, and that the attention heads know what's real JSON and what's fake JSON.

We know not to send sensitive data to the browser, no matter how obfuscated or obscure. What I think is an important mental model is that once your data is being accessed by an LLM, and there's any kind of user data involved, that's an almost equally untrusted environment. You can mitigate, pre-screen for prompt injection-y things, but at the end of the day it may not be enough.

wredcoll|7 months ago

Are these the same guys who had the post here like 2 days ago about how you could "hack claude over email" or some such?

bryant|7 months ago

Companies are rushing or skipping a lot of required underlying security controls in a quest to be first or quick to market with what they think is transformative applications of AI. And so far, probably very few have gotten it right and generally only with serious spend.

For instance, how many companies do you think have played with dedicated identities for each instance of their agents? Let alone hard-restricting those identities (not via system prompts but with good old fashioned access controls) to only the data and functions they're supposed to be entitled to for just that session?

It's a pretty slim number. Only reason I'm not guessing zero is because it wouldn't surprise me if maybe one company got it right. But if there was a way to prove that nobody's doing this right, I'd bet money on it for laughs. These are things that in theory we should've been doing before AI happened, and yet it's all technical debt alongside every "low" or "medium" risk for most companies because up until now, no one could rationalize the spend.

buu700|7 months ago

The sad thing is it's not even difficult to get right. I've got something launching soon with a couple different chatbots that I'll share with you later, and it would never even have occurred to me to rely on system prompts for security. A chatbot in my mind is just a CLI with extra steps; if the bot is given access to something, the user is presumed to have equal access.

sothatsit|7 months ago

Honestly, I cannot even believe that Stripe MCP exists, outside of maybe being a useful tool for setting up a Stripe test environment and nothing more. I'm terrified of giving an LLM access to anything that is not a text document that I can commit to git and revert if it does something wrong.

refulgentis|7 months ago

Somehow this site keeps making these posts and making it up front page and people keep sharing the same opinions

DrewADesign|7 months ago

> Somehow this site keeps making these posts and making it up front page and people keep sharing the same opinions

You sure? In their 5 month submit history, they’ve got one post with nearly 900 votes, this post, one post with 17, and a handful of others that didn’t break 10. Perhaps you’re confusing it with another site.

bugbuddy|7 months ago

This event was predicted by the Oracle of Delphi. Seriously, everyone knew this was just waiting to happen. The pawning will continue until everyone stops water-hosing the kool aid.