Space secrets leak disclosure

[+] Aachen|1 year ago|reply

Just two days ago I flipped through a slide deck from a security conference where the author, Jossef Harush Kadouri, found that using a model from a place like Huggingface means the author of the model can execute any code on your machine. Not sure if the slides are uploaded elsewhere, I got them sent as file: https://dro.pm/c.pdf (45MB) slide 188

I didn't realise at the time that I flipped through the slides that this means not only the model's author gets to run code on your machine, but also if Huggingface got a court-signed letter or if someone hacked them (especially if they don't notice for a while¹)

As someone not in the AI scene, I've never run these models but was surprised at how quickly the industry standardised the format. I had assumed model files were big matrices of numbers and some metadata perhaps, but now I understand how they managed so quickly: a model is (eyeing slides 186 and 195) a Python script that can do whatever it wants. That makes "standardisation" exceedingly easy: everyone can do their own thing and you sidestep the problem altogether. But that comes with a cost.

¹ https://www.verizon.com/business/resources/articles/s/how-to... says 20% doesn't notice for months; of course, it depends on the situation and what actions the attackers take

[+] strangecasts|1 year ago|reply

> I had assumed model files were big matrices of numbers and some metadata perhaps

ONNX [1] is more or less this, but the challenge you immediately run into is models with custom layers/operators with their own inference logic - you either have to implement those operators in terms of the supported ops (not necessarily practical or viable) or provide the implementation of the operator to the runtime, putting you back at square one.

[1] https://onnx.ai/

[+] addandsubtract|1 year ago|reply

Isn't that why we have the .safetensors format, which can't execute code on the host?

[+] worstspotgain|1 year ago|reply

As others have pointer out, this is format-dependent. One of the formats that hasn't been white-listed in this thread yet is GGUF, used by llama.cpp and derivates. It's pretty much "big matrices of numbers and some metadata." Some vulnerabilities were found [1] and patched.

[1] https://www.databricks.com/blog/ggml-gguf-file-format-vulner...

[+] d-z-m|1 year ago|reply

> using a model from a place like Huggingface means the author of the model can execute any code on your machine

To my knowledge this is only a problem if the model is serialized/de-serialized via pickle[0].

[0]: https://huggingface.co/docs/hub/en/security-pickle

[+] Aachen|1 year ago|reply

(The dro.pm link will expire any minute now. It's so short because it's temporary, should maybe have used a more permanent service. I've found the talk here in case you're reading this later: https://m.youtube.com/watch?v=8XysLIq-e3s)

[+] mvandermeulen|1 year ago|reply

> Just two days ago I flipped through a slide deck from a security conference where the author, Jossef Harush Kadouri, found that using a model from a place like Huggingface means the author of the model can execute any code on your machine.

Proceeds to link to pdf of unknown origins

[+] koolala|1 year ago|reply

Hugging face standing right behind you, ready for hugs

[+] bongodongobob|1 year ago|reply

Are you telling me that when I run software on my computer I could potentially be running software on my computer?

[+] fieldcny|1 year ago|reply

That’s a very weasley worded statement, to begin with “they have suspicions” is not a statement that should be in a communication of this type

[+] erhaetherth|1 year ago|reply

I thought it was pretty good actually. Most of these leak disclosures usually say things like "We do not have evidence they accessed any secrets" or something like that, because they don't "know" what the hackers did once they were in. At least huggingface is saying "Yeah, they probably accessed secrets but we can't confirm it"

[+] afro88|1 year ago|reply

> Over the past few days, we have made other significant improvements to the security of the Spaces infrastructure, including completely removing org tokens (resulting in increased traceability and audit capabilities), implementing key management service (KMS) for Spaces secrets, robustifying and expanding our system’s ability to identify leaked tokens and proactively invalidate them, and more generally improving our security across the board.

That's a serious amount of non-trivial work to be done in "a few days". The kind of work that should trigger more time consuming activities like security audits, pen tests and the like, before going live, right?

[+] erhaetherth|1 year ago|reply

Hopefully the work was underway for awhile already, and maybe they just launched it now because the damage is already done?

[+] fragmede|1 year ago|reply

at a larger organization with a whole SRE department that inclues a dedicated security team, sure, but (my impression is) huggingface isn't that size of an org (yet).

[+] foolishbard|1 year ago|reply

My anthropic key was leaked and someone ran up a 10k bill on it. Are HF going to cover that?

[+] jerpint|1 year ago|reply

My openAI key was leaked and I noticed someone was using it, luckily the damage wasn’t nearly as bad as you. A few dollars worth of GPT4, a model none of my apps were using at the time.

I’m almost entirely certain it was leaked via secrets on HF space, I got a message a few days ago warning me some of my spaces were affected

[+] Tiberium|1 year ago|reply

Are you sure it was only stored in your space secrets? Not variables (which are public) or stored in the .env file (also public).

[+] mrkramer|1 year ago|reply

I always thought you could set your "maximum limit" for spending on cloud providing platforms.

[+] jerpint|1 year ago|reply

I noticed a few weeks ago that some of my OpenAI keys got compromised, they were only active as secrets on a huggingface space. I got an email a few days ago informing me that the spaces were compromised , so I suspect this issue has been going on for at least a few weeks

[+] Liftyee|1 year ago|reply

The title made me think this was an article about space, but instead I got an article about Space.

[+] Mo3|1 year ago|reply

I legit thought someone leaked proof of extraterrestrial life and disclosure began.

Another day..

[+] nmstoker|1 year ago|reply

There's no mention of handling with regard to costs inappropriately incurred - wouldn't access to the secrets let people call APIs and run up costs?

Or is this purely about theft of data/code?

[+] jerpint|1 year ago|reply

It could be both. In my case my keys were used to call OpenAI, almost certain they were leaked from my Spaces secrets

[+] WhackyIdeas|1 year ago|reply

What is ‘Space’ ?

[+] belter|1 year ago|reply

Its shortcut for their Spaces.

https://huggingface.co/docs/hub/en/spaces-overview

The front end/portal. I speculate that is coded in Python. Maybe some Django thing...

[+] arresin|1 year ago|reply

It’s a vm where they run your code

[+] TekMol|1 year ago|reply

Why does HF store "secrets"?

Couldn't they just store a public key, the user has the secret key and signs their requests with that?

[+] foolishbard|1 year ago|reply

You can build apps hosted on HF which access third party APIs, e.g. OpenAI or Anthropic. The api keys for these are then stored in the HF secrets

[+] cushpush|1 year ago|reply

Usually you need some sort of "token" that lets you practically operate within a browser session. It seems like this is about tokens they had to revoke, which is kinda like a password but not.

[+] white_beach|1 year ago|reply

given how difficult it was do a simple thing -- this was not a surprise

[+] swader999|1 year ago|reply

For all those wondering, this is not about aliens.

[+] belter|1 year ago|reply

It's just occurred to me that if Aliens wanted to take over Earth...They could progressively leak scientific secrets, under the disguise of normal scientific progress. This would lead us create a Trojan-ed AGI, that would take over everything, and just build spaceships to ship them all our Palladium...Just imagine a giant spaceship on the way to Proxima Centauri full of stolen catalytic converters....

Can't get into the details, but it seems there is a way to convert Palladium into Dilithium Crystals. When you achieve that all hell breaks loose....

[+] wslh|1 year ago|reply

Before reading it I thought their AI components detected life beyond Earth, basically what SETI has been doing for decades [1].

[1] https://www.seti.org/

[+] Rucadi|1 year ago|reply

That's what NASA wants us to think.

[+] unknown|1 year ago|reply

[deleted]

[+] Oarch|1 year ago|reply

How can you be sure? Have you asked them?

[+] ChickeNES|1 year ago|reply

tbh, I didn't think it would be aliens, I thought it would be ITAR related.

[+] trashface|1 year ago|reply

Disappointed.

[+] unknown|1 year ago|reply

[deleted]

[+] _aaed|1 year ago|reply

[deleted]

[+] moose44|1 year ago|reply

https://news.ycombinator.com/item?id=40539993

[+] Macuyiko|1 year ago|reply

Very disheartening. HF is doing so much good in the AI community, much more than regulators understand at the moment.

[+] beardedwizard|1 year ago|reply

What does this comment mean? Why is it disheartening? What do regulators have to do with it?

83 comments