top | item 34930419

(no title)

almet | 3 years ago

It's still the same story : PyPI still doesn't have a way to automatically detect interactions with the network and the filesystems for the submitted packages. It's a complex thing to do for sure, but that would be a welcome addition, I guess.

discuss

order

woodruffw|3 years ago

PyPI still doesn't have this because no packaging ecosystem does. It's impossible to do in the general case if your packaging schema allows arbitrary code execution, which Python (and Ruby, and NPM, and Cargo, etc.) allow.

The closest thing is pattern/AST matching on the package's source, but trivial obfuscation defeats that. There's also no requirement that a package on PyPI is even uploaded with source (binary wheel-only packages are perfectly acceptable).

spenczar5|3 years ago

"no packaging ecosystem does."

This is a little bit too strong, since packaging doesn't require arbitrary code execution. For example, Go doesn't permit arbitrary code execution during `go get`. Now - there have been bugs which permit code execution (like https://github.com/golang/go/issues/22125) but they are treated as security vulnerabilities and bugs.

Of course, you're right about Python.

eigenvalue|3 years ago

This seems eminently solvable though. Why can’t every package submission cause some minimal sandboxed docker image to install the package and call the various functions and methods and log all network and disk activity? If anything looks suspicious it would be denied and the submitter would have to appeal it, explaining why the submission is valid. The same applies for NPM and Cargo. I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start. This seems like the kind of thing that wouldn’t even cost all that much, and big corporate users of python would stand to benefit.

blibble|3 years ago

> It's impossible to do in the general case if your packaging schema allows arbitrary code execution

Java's type system: ClassLoaders plus SecurityManager was impossible?

that's literally how Java applets worked, enforced through the type system

https://docstore.mik.ua/orelly/java-ent/security/ch03_01.htm

yes, SecurityManager was a poor implementation for many reasons, but it's definitely not "impossible" to sandbox downloaded code from the network while having it interact with other existing code, you can do it with typing alone

almet|3 years ago

I'm not sure it's not do-able, actually. What about having an execution sandbox and a way to check the calls made during the execution of the install script for instance?

I worked a few years back on something like this but it went nowhere, but I still believe it would be doable and useful. The only trace I found back is https://wiki.python.org/moin/Testing%20Infrastructure, which contains almost no info...

photon12|3 years ago

Smart attackers are already/will add `sleep(SOME_NUMBER_LONGER_THAN_SCAN_SANDBOX_LIFETIME)` before anything that does FS or network access. Not to say that this wouldn't be a welcome addition, but the scanning needs to be understood in the context of the inherent limitations of large scale runtime behavior detection of packages when you have a fixed amount of hardware and time for running those scans.