Safetensors solves RCE, but it doesn't solve legal liability. I scan .safetensors because metadata headers often contain restrictive licenses (like CC-BY-NC) that contradict the repo's README. Deploying a non-commercial model in a commercial SaaS is a security/compliance incident, even if no code is executed (PS I'm in the EU and it's important for us).
Additionally, a massive portion of the ecosystem is still stuck on Pickle/PyTorch .bin.
Right, but in these environments (PS, I'm also in the EU, also work in the ecosystem) we don't just deploy 3rd party data willy nilly, you take some sort of ownership of the data, review+polish and then you deploy that. Since security and compliance is important for you, I'm assuming you're doing the same?
And when you're doing that, you have plenty of opportunity to turn Pickle into whatever format you want, since you're holding and owning the data anyways.
Don't you suppose that in a large company with teams of 50+ devs/DS pulling models for experiments, enforcing a manual "review+polish+convert" workflow for every single artifact can create a massive bottleneck and, as a result, shadow IT?
Doesn't it make sense to automate the "review" part?
embedding-shape|1 month ago
And when you're doing that, you have plenty of opportunity to turn Pickle into whatever format you want, since you're holding and owning the data anyways.
arseniibr|1 month ago