top | item 43508418

We hacked Gemini's Python sandbox and leaked its source code (at least some)

669 points| topsycatt | 11 months ago |landh.tech

144 comments

order

topsycatt|11 months ago

That's the system I work on! Please feel free to ask any questions. All opinions are my own and do not represent those of my employer.

ryao|11 months ago

I imagine you need to make and destroy sandboxed environments quite often. How fast does your code create a sandboxed environment?

Do you make the environments on demand or do you make them preemptively so that one is ready to go the moment that it is needed?

If you make them on demand, have you tested ZFS snapshots to see if it can be done even faster using zfs clone?

hnuser123456|11 months ago

Is the interactive python sandbox incompatible with thinking models? It seems like I can only get the interactive sandbox by using 2.0 flash, not 2.0 flash thinking or 2.5 pro.

wunderwuzzi23|11 months ago

That's cool. I did something similar in the early days with Google Bard when data visualization was added, which I believe was when the ability to run code got introduced.

One question I always had was what the user "grte" stands for...

Btw. here the tricks I used back then to scrape the file system:

https://embracethered.com/blog/posts/2024/exploring-google-b...

fragmede|11 months ago

Do you think "hacked Gemini and leaked its source code" is an accurate representation of what happened here?

enoughalready|11 months ago

Have you contemplated running the python code in a virtual environment in the browser?

seydor|11 months ago

you re the hacker or the google?

Mindwipe|11 months ago

Does anyone at Google care that you're trying to replace Assistant with this in the next few months and it can't set a timer yet?

(I mean it will tell you it's set a timer but it doesn't talk to the native clock app so nothing ever goes off if you navigate away from the window.)

jwlake|11 months ago

Is there any reason it's not documented?

KennyBlanken|11 months ago

Can you get someone to fix the CSS crap on the website? When I have it open it uses 40-50% of my GPU (normally ~5% in most usage)...and when I try to scroll, the scrolling is jerky mess?

simonw|11 months ago

I've been using a similar trick to scrape the visible internal source code of ChatGPT Code Interpreter into a GitHub repository for a while now: https://github.com/simonw/scrape-openai-code-interpreter

It's mostly useful for tracking what Python packages are available (and what versions): https://github.com/simonw/scrape-openai-code-interpreter/blo...

Zopieux|11 months ago

Meanwhile they could just decide to publish this list in a document somewhere and keep it automatically up to date with their infra.

But not, secrecy for the sake of secrecy.

lqstuart|11 months ago

So by “we hacked Gemini and leaked its source code” you really mean “we played with Gemini with the help of Google’s security team and didn’t leak anything”

worldsavior|11 months ago

Sad that I didn't read this comment before reading this article.

tgtweak|11 months ago

The definition of hacking is getting pretty loose. This looks like the sandbox is doing exactly what it's supposed to do and nothing sensitive was exfiltrated...

bluelightning2k|11 months ago

Cool write up. Although it's not exactly a huge vulnerability. I guess it says a lot about how security conscious Google is that they consider this to be significant. (You did mention that you knew the company's specific policy considered this highly confidential so it does count but it feels a little more like "technically considered a vulnerability" rather than clearly one.)

jll29|11 months ago

Running the built-in "strings" command to extract a few file names from a binary is hardly hacking/cracking.

Ironically, though, getting the source code of Gemini perhaps wouln't be valuable at all; but if you had found/obtained access to the corpus that the model was pre-trained with, that would have been kind of interesting (many folks have many questions about that...).

dvt|11 months ago

> but if you had found/obtained access to the corpus that the model was pre-trained with, that would have been kind of interesting

Definitionally, that input gets compressed into the weights. Pretty sure there's a proof somewhere that shows LLM training is basically a one-way (lossy) compression, so there's no way to go back afaik?

theLiminator|11 months ago

It's actually pretty interesting that this shows that Google is quite secure, I feel like most companies would not fare nearly as well.

kccqzy|11 months ago

Yes and especially the article mentions "With the help of the Google Security Team" so it's quite collaborative and not exactly black box hacking.

commandersaki|11 months ago

Their "LLM bugSWAT" events, held in vibrant locales like Las Vegas, are a testament to their commitment to proactive security red teaming.

I don't understand why security conferences are attracted to Vegas. In my opinion its a pretty gross place to conduct any conference.

lmm|11 months ago

Excluding uptight scolds is a feature not a bug. There's a lot of overlap between people who find Vegas objectionable and people who find red teaming objectionable (because why would any decent person know attacking/exploiting techniques).

zem|11 months ago

relatively cheap event space and hotels. it's hard to find a city to host a large conference.

desmosxxx|11 months ago

What don't you understand. Vegas is literally built for conferences.

hashstring|11 months ago

Real, I feel the exact same way.

numbsafari|11 months ago

You answered your own question.

ein0p|11 months ago

They hacked the sandbox, and leaked nothing. The article is entertaining though.

kccqzy|11 months ago

They leaked one file in the sandbox that contained lots of internal proto files. The security team reviewed everything in the sandbox and thought nothing in it is sensitive and gave the green light; apparently the review didn't catch this in the sandbox.

I guess this is a failing of the security review process, and possibly also how the blaze build system worked so well that people forgot a step existed because it was too automated.

fpgaminer|11 months ago

Awww, I was looking forward to seeing some of the leak ;) Oh well. Nice find and breakdown!

Somewhat relatedly, it occurred to me recently just how important issues like prompt injection, etc are for LLMs. I've always brushed them off as unimportant to _me_ since I'm most interested in local LLMs. Who cares if a local LLM is weak to prompt injection or other shenanigans? It's my AI to do with as I please. If anything I want them to be, since it makes it easier to jailbreak them.

Then Operator and Deep Research came out and it finally made sense to me. When we finally have our own AI Agents running locally doing jobs for us, they're going to encounter random internet content. And the AI Agent obviously needs to read that content, or view the images. And if it's doing that, then it's vulnerable to prompt injection by third party.

Which, yeah, duh, stupid me. But ... is also a really fascinating idea to consider. A future where people have personal AIs, and those AIs can get hacked by reading the wrong thing from the wrong backalley of the internet, and suddenly they are taken over by a mind virus of sorts. What a wild future.

20after4|11 months ago

> reading the wrong thing from the wrong backalley of the internet, and suddenly they are taken over by a mind virus of sorts. What a wild future.

This already happens to people on the internet.

Cymatickot|11 months ago

Probably best text I've seen in AI train ride recently:

""""" As companies rush to deploy AI assistants, classifiers, and a myriad of other LLM-powered tools, a critical question remains: are we building securely ? As we highlighted last year, the rapid adoption sometimes feels like we forgot the fundamental security principles, opening the door to novel and familiar vulnerabilities alike. """"

There this case and there many other cases. I worry for copy & paste dev.

qwertox|11 months ago

Super interesting article.

> but those files are internal categories Google uses to classify user data.

I really want to know what kind of classification this is. Could you at least give one example? Like "Has autism" or more like "Is user's phone number"?

StephenAmar|11 months ago

The latter. Like is it a public ID, an IP, user input, ssn, phone number, lat/long…

Very useful for any scenario where you output the proto, like logs, etc…

mr_00ff00|11 months ago

Slightly irrelevant, but love the color theme on the python code snippets. Wish I knew what it was.

b0ner_t0ner|11 months ago

Very distracting background/design on desktop; had to toggle reader view.

paxys|11 months ago

Funny enough while "We hacked Google's AI" is going to get the clicks, in reality they hacked the one part of Gemini that was NOT the LLM (a sandbox environment meant to run untrusted user-provided code).

And "leaked its source code" is straight up click bait.

dang|11 months ago

Ok, we put the sandbox in the title above. Thanks!

(Submitted title was "We hacked Google's A.I Gemini and leaked its source code (at least some part)")

IshKebab|11 months ago

They didn't even hack it.

HenryBemis|11 months ago

Click and cash (for the great trio).

sneak|11 months ago

> However, the build pipeline for compiling the sandbox binary included an automated step that adds security proto files to a binary whenever it detects that the binary might need them to enforce internal rules. In this particular case, that step wasn’t necessary, resulting in the unintended inclusion of highly confidential internal protos in the wild !

Protobufs aren't really these super secret hyper-proprietary things they seem to make them out to be in this breathless article.

film42|11 months ago

No, but having the names to the fields, directly from Google, is very helpful for further understanding what's available from within the sandbox.

ratorx|11 months ago

Yup, there’s no reason to believe that the proto files (which are definitions rather than data) are any more confidential than the Gemini source code itself.

daeken|11 months ago

Yeah, this is honestly super interesting as a journey, but not as a destination. The framing takes away from how cool the work really is.

ipsum2|11 months ago

Yes, there's a lot of internal protos from Google that are leaked on the internet. If I recall correctly, it was a hacker News comment that linked to it.

Edit: I don't know why the parent comment was flagged. It is entirely accurate.

whatevertrevor|11 months ago

The protos in question are related to internal authn/z so it's conceivable that having access to that structure would be valuable information to an attacker.