top | item 40544875

Space secrets leak disclosure

197 points| markyg | 1 year ago |huggingface.co

83 comments

order
[+] Aachen|1 year ago|reply
Just two days ago I flipped through a slide deck from a security conference where the author, Jossef Harush Kadouri, found that using a model from a place like Huggingface means the author of the model can execute any code on your machine. Not sure if the slides are uploaded elsewhere, I got them sent as file: https://dro.pm/c.pdf (45MB) slide 188

I didn't realise at the time that I flipped through the slides that this means not only the model's author gets to run code on your machine, but also if Huggingface got a court-signed letter or if someone hacked them (especially if they don't notice for a while¹)

As someone not in the AI scene, I've never run these models but was surprised at how quickly the industry standardised the format. I had assumed model files were big matrices of numbers and some metadata perhaps, but now I understand how they managed so quickly: a model is (eyeing slides 186 and 195) a Python script that can do whatever it wants. That makes "standardisation" exceedingly easy: everyone can do their own thing and you sidestep the problem altogether. But that comes with a cost.

¹ https://www.verizon.com/business/resources/articles/s/how-to... says 20% doesn't notice for months; of course, it depends on the situation and what actions the attackers take

[+] strangecasts|1 year ago|reply
> I had assumed model files were big matrices of numbers and some metadata perhaps

ONNX [1] is more or less this, but the challenge you immediately run into is models with custom layers/operators with their own inference logic - you either have to implement those operators in terms of the supported ops (not necessarily practical or viable) or provide the implementation of the operator to the runtime, putting you back at square one.

[1] https://onnx.ai/

[+] addandsubtract|1 year ago|reply
Isn't that why we have the .safetensors format, which can't execute code on the host?
[+] d-z-m|1 year ago|reply
> using a model from a place like Huggingface means the author of the model can execute any code on your machine

To my knowledge this is only a problem if the model is serialized/de-serialized via pickle[0].

[0]: https://huggingface.co/docs/hub/en/security-pickle

[+] Aachen|1 year ago|reply
(The dro.pm link will expire any minute now. It's so short because it's temporary, should maybe have used a more permanent service. I've found the talk here in case you're reading this later: https://m.youtube.com/watch?v=8XysLIq-e3s)
[+] mvandermeulen|1 year ago|reply
> Just two days ago I flipped through a slide deck from a security conference where the author, Jossef Harush Kadouri, found that using a model from a place like Huggingface means the author of the model can execute any code on your machine.

Proceeds to link to pdf of unknown origins

[+] koolala|1 year ago|reply
Hugging face standing right behind you, ready for hugs
[+] bongodongobob|1 year ago|reply
Are you telling me that when I run software on my computer I could potentially be running software on my computer?
[+] fieldcny|1 year ago|reply
That’s a very weasley worded statement, to begin with “they have suspicions” is not a statement that should be in a communication of this type
[+] erhaetherth|1 year ago|reply
I thought it was pretty good actually. Most of these leak disclosures usually say things like "We do not have evidence they accessed any secrets" or something like that, because they don't "know" what the hackers did once they were in. At least huggingface is saying "Yeah, they probably accessed secrets but we can't confirm it"
[+] afro88|1 year ago|reply
> Over the past few days, we have made other significant improvements to the security of the Spaces infrastructure, including completely removing org tokens (resulting in increased traceability and audit capabilities), implementing key management service (KMS) for Spaces secrets, robustifying and expanding our system’s ability to identify leaked tokens and proactively invalidate them, and more generally improving our security across the board.

That's a serious amount of non-trivial work to be done in "a few days". The kind of work that should trigger more time consuming activities like security audits, pen tests and the like, before going live, right?

[+] erhaetherth|1 year ago|reply
Hopefully the work was underway for awhile already, and maybe they just launched it now because the damage is already done?
[+] fragmede|1 year ago|reply
at a larger organization with a whole SRE department that inclues a dedicated security team, sure, but (my impression is) huggingface isn't that size of an org (yet).
[+] foolishbard|1 year ago|reply
My anthropic key was leaked and someone ran up a 10k bill on it. Are HF going to cover that?
[+] jerpint|1 year ago|reply
My openAI key was leaked and I noticed someone was using it, luckily the damage wasn’t nearly as bad as you. A few dollars worth of GPT4, a model none of my apps were using at the time.

I’m almost entirely certain it was leaked via secrets on HF space, I got a message a few days ago warning me some of my spaces were affected

[+] Tiberium|1 year ago|reply
Are you sure it was only stored in your space secrets? Not variables (which are public) or stored in the .env file (also public).
[+] mrkramer|1 year ago|reply
I always thought you could set your "maximum limit" for spending on cloud providing platforms.
[+] jerpint|1 year ago|reply
I noticed a few weeks ago that some of my OpenAI keys got compromised, they were only active as secrets on a huggingface space. I got an email a few days ago informing me that the spaces were compromised , so I suspect this issue has been going on for at least a few weeks
[+] Liftyee|1 year ago|reply
The title made me think this was an article about space, but instead I got an article about Space.
[+] Mo3|1 year ago|reply
I legit thought someone leaked proof of extraterrestrial life and disclosure began.

Another day..

[+] nmstoker|1 year ago|reply
There's no mention of handling with regard to costs inappropriately incurred - wouldn't access to the secrets let people call APIs and run up costs?

Or is this purely about theft of data/code?

[+] jerpint|1 year ago|reply
It could be both. In my case my keys were used to call OpenAI, almost certain they were leaked from my Spaces secrets
[+] TekMol|1 year ago|reply
Why does HF store "secrets"?

Couldn't they just store a public key, the user has the secret key and signs their requests with that?

[+] foolishbard|1 year ago|reply
You can build apps hosted on HF which access third party APIs, e.g. OpenAI or Anthropic. The api keys for these are then stored in the HF secrets
[+] cushpush|1 year ago|reply
Usually you need some sort of "token" that lets you practically operate within a browser session. It seems like this is about tokens they had to revoke, which is kinda like a password but not.
[+] white_beach|1 year ago|reply
given how difficult it was do a simple thing -- this was not a surprise
[+] swader999|1 year ago|reply
For all those wondering, this is not about aliens.
[+] belter|1 year ago|reply
It's just occurred to me that if Aliens wanted to take over Earth...They could progressively leak scientific secrets, under the disguise of normal scientific progress. This would lead us create a Trojan-ed AGI, that would take over everything, and just build spaceships to ship them all our Palladium...Just imagine a giant spaceship on the way to Proxima Centauri full of stolen catalytic converters....

Can't get into the details, but it seems there is a way to convert Palladium into Dilithium Crystals. When you achieve that all hell breaks loose....

[+] wslh|1 year ago|reply
Before reading it I thought their AI components detected life beyond Earth, basically what SETI has been doing for decades [1].

[1] https://www.seti.org/

[+] Rucadi|1 year ago|reply
That's what NASA wants us to think.
[+] Oarch|1 year ago|reply
How can you be sure? Have you asked them?
[+] ChickeNES|1 year ago|reply
tbh, I didn't think it would be aliens, I thought it would be ITAR related.
[+] Macuyiko|1 year ago|reply
Very disheartening. HF is doing so much good in the AI community, much more than regulators understand at the moment.
[+] beardedwizard|1 year ago|reply
What does this comment mean? Why is it disheartening? What do regulators have to do with it?