donaldstufft's comments

donaldstufft | 3 years ago | on: Yes, I have opinions on your open source contributions

- https://python-security.readthedocs.io/pypi-vuln/index-2022-... - https://www.mend.io/resources/blog/npm-package-javascript-li...

donaldstufft | 3 years ago | on: Yes, I have opinions on your open source contributions

PyPI doesn't currently prevent deletions at all, so a maintainer can just flat out delete their stuff if they want. Though we are discussing if we want to tighten that up at all.

I would say that in the > 10 years of time I've been an admin on PyPI and even longer that I've been involved in Python's Packaging, I've never once seen anyone, but individuals placed front and center in who we're attempting to serve first and foremost.

Businesses and other large orgs do come up from time to time, and we've generally been pretty good at making decisions that allowed those businesses to solve their problems on their own, without requiring the rest of the ecosystem to do it for them. In other words, our features we add for businesses tend to be unblocking them from being able to implement something they want, rather than taking on the responsibility of whatever thing they want.

donaldstufft | 3 years ago | on: Yes, I have opinions on your open source contributions

All major repositories editorialize to some extent, they would be awful if they did not.

For instance, PyPI will take down your software if we determine it to be malicious in some way, which is extending editorial control over what's on PyPI.

Generally speaking, though, PyPI does not concern itself with the contents of the packages shipped through it as long as they are reasonable packages. If a project wants to break compatibility, then that's on them. We strive to let projects manage themselves, in whatever way they think makes the most sense.

donaldstufft | 7 years ago | on: Malicious Python libraries found and removed from PyPI

Honestly, this is kind of FUDish and assumes the worst case scenario for upstream developers, and the best case scenario for the distro maintainers.

Another way of phrasing what this extra layer of maintainers provides, is a second group of people who can introduce their own irresponsible, user-hostile, and potentially malicious (or at the very least, negligent) decisions. Worse, often times these developers have less (in some cases, far less) knowledge of how the code itself works, and are applying their own patches, often with minimal testing, without fully understanding the scope or impact of the changes they're making. For every poor decision you can find in a package that is popular enough to even appear in one of these downstream repositories, one could just as easily find a case where this extra layer is introducing their own problems.

The non FUD-ish answer is that whether you get your software directly from the upstream developers through an uncurated repository like PyPI, or through a curated repository like a Linux repository neither one is inherently better than the other. Each of them has a variety of pros and cons and part modern day engineering is looking at these tradeoffs and choosing the right set for your particular situation. Sometimes that will even mean that you're choosing different tradeoffs for different packages on the same system.

donaldstufft | 8 years ago | on: Malicious software libraries found in PyPI posing as well known libraries

> This is the weakest argument. Are Python devs somehow dumber than Java devs? Are they dumber than Android devs? Are they dumber than iOS devs? Everyone knows how to sign a dependency/app/project except python devs? I don't believe that. I honestly think that's the most insulting aspect of this argument.

Nope, I think they're perfectly capable of signing things. I also think it's silly to ask them to do that when the proposed system hasn't been designed to provide any benefit. Properly designing that system is hard, and 99% of people who go "just use PGP!" or "just use X" have spent exactly zero amount of time doing that. Particularly when the proposed solution doesn't actually solve the problem at hand (though it does solve other problems if it's correctly designed).

Ultimately your "suggestions" are nothing new, they're the same generic, cargo culting, suggestions that folks who haven't looked really hard at the problem tend to make.

donaldstufft | 8 years ago | on: Malicious software libraries found in PyPI posing as well known libraries

> Herd immunity. Someone is out there reviewing it.

More likely everyone assumes someone else is reviewing it, and nobody actually does.

donaldstufft | 8 years ago | on: Malicious software libraries found in PyPI posing as well known libraries

> Key X is on the company approved key list, key y is not. Your argument just fell apart.

A minuscule amount of people are going to bother to do something like approve keys. Security for the minority can already be achieved by those companies mandating their developers use DevPI and mirroring trusted projects from PyPI to DevPI (or similar system).

Complicating the system further for something that, for practical purposes, does not improve the security of the vast bulk of people is not a trade off we're willing to make. Package signing will come to PyPI, likely in the form of TUF which is strictly superior to the trust model provided by PGP for package signing. It hasn't done so because nobody has had the time to do it yet.

What you seem to be missing about my statement both in blog post and here is not that package signing is not worthwhile, but that a lot of people like yourself seem to think that all you need to do is add signatures to a system and suddenly poof it's secure! That view point is common among inexperienced developers or people who don't commonly think too hard about how secure systems are designed/made.

The reality of the situation that adding signatures is painfully easy, but that without a coherent trust model backing those signatures you've achieved nothing but adding more complexity. Determining a trust model (particularly one that works for the majority) is the hard part, and you can't just wave your hand and wish it better.

> Sonatype has turned this into a rather nice business. It's not a volunteer project for them. You expect me to believe it's impossible despite solid examples to the contrary?

Is it impossible to turn PyPI into a business? I don't suspect it is no. However I don't want to do that because my personal risk tolerance doesn't have room for giving up a stable job with health benefits for something that may or may not fail. Others are free to try that if they want of course, but given the lack of people stepping forward to do that, it doesn't seem like anyone else is interested either.

> Blaming the victims.

Stating reality. PyPI is not a curated repository and the end users is responsible for their own security while using it. If they wish to outsource that responsibility there are a number of Linux distributions that are happy to do that for them as well as companies like Enthought and Continuum Analytics who provide curated repositories.

> It's also not achieved by doing absolutely nothing at all.

Good thing we're not doing nothing at all then. Luckily for the Python community we have actual experts and not arm chair cryptographers who fail to understand even the basic fundamentals of developing secure software.

donaldstufft | 8 years ago | on: Malicious software libraries found in PyPI posing as well known libraries

Package signing doesn't achieve anything without a trust model behind it, which is exactly what that post states. Too many people go "we need to add some crypto to this thing!" without developing a threat model and that ends up making the crypto pointless wankery to act as a security blanket without actually solving any problems.

Maven Central, to my knowledge, does not have typo squatting problems because Sonatype has a manual review process for all new projects. It has absolutely nothing to do with the fact that they allow projects to upload PGP signatures and it could not have anything to do with that, because PGP does not provide any mechanism to prevent that.

For example, there may be `urllib3` which is a valid project that must be signed by key X. We'll ignore how a tool like pip would find out that key X is the right key (although this is actually the most important part of a package signing solution) and just grant that we've solved that problem. Someone then comes and registers another project, `urlib3` which must be signed by key Y. The attack that is being described here is that a user would erroneously say ``pip install urlib3`` when they meant to type ``pip install urllib3`` and pip would then fetch that and download the package and install it. I think it is pretty obvious that signing doesn't help here, because pip doesn't know that the user really wanted urllib3 and not urlib3, so it can only determine that urlib3 is supposed to be signed by key Y (which of course, the hypothetical malicious person controlling urlib3 would have), fetch the package and verify it's signature.

There is only one tried and true method for preventing across the board this kind of human introduced error collision (aka typo squatting), and that is manual review of all new projects. The problem with manual review then becomes one scale. There are as of this time of writing 117,226 unique projects on PyPI with an average growth of around 100 new projects a day. In addition there are zero full time developers or operations or support people working on PyPI. There is one part time paid person (me), plus my unpaid time, plus one other part time unpaid developer/ops person who do the vast bulk of the work. There is simply not enough available bandwidth to process 100 new projects every day and to validate them for typo squatting/confusion possibilities.

Beyond that, there are a number of possible heuristic based approaches that can try to reduce the chance of this from happening such as using levinstein distance, unicode confusables, attempting develop "reputation", etc. Most of these are either so broad as to catch a lot of projects which are not typo squatting but are real, actual different things or are so narrow as to be trivially defeated. That's not to say they aren't worthwhile or there isn't an idea that would make sense but focusing on that has not been a priority for a largely volunteer based organization because there are lower hanging fruit that are more impactful , because at the end of the day without a manual review system individual end users are still ultimately responsible for ensuring they're asking for the correct thing (and even beyond that, they're responsible for ensuring that the thing they're asking be installed is something that satisfies their own security constraints).

Security is achieved by layering multiple secure systems on top of each other, not by randomly rubbing crypto on things because it makes you feel good to have crypto involved.

donaldstufft | 10 years ago | on: Powering the Python Package Index

It's a few things.

One of the simplest reasons is as you identified, it's easier to get smaller donations from multiple people than it is to get one large donation from a single company (although we do have large donations too ranging from 30,000/month and going downwards from there-- for all the PSF infra not just PyPI).

Another part of that is a lot of this has grown organically over time and we sought out donations from different providers based on our need at the time.

In addition, I can't think of a single company that actually provides everything we need except for maybe Amazon/AWS.

On top of all of that the more we centralize our donations onto a single company, the more important a single company becomes to PyPI and the larger the amount of Risk we take on is. It would be a lot harder to find a replacement for all of the things we need all at once than it would be to find a replacement for just a single service.

All in all, managing these accounts is not particularly hard (though in some part that's likely because the set of people who has access to any of one of these is pretty static). Most of them provide some sort of standardize API access that doesn't really change based on who is providing said thing (in general, we attempt to rely as much as possible on "Hosted X" where X is some OSS thing we could possibly run ourselves or switch to someone else's "Hosted X" if need be. It's not mandatory but the harder it would be to switch the more we factor that into our decision (for instance, our use of S3 is pretty simple so we don't worry about their proprietary API because it wouldn't be difficult to modify the code to do it differently).

donaldstufft | 11 years ago | on: Incremental Plans to Improve Python Packaging

That already exists, setuptools has supported it for years and years. Nobody uses it though and they prefer to use virtualenv instead. That may be because setuptools itself wasn't that great, or it may be that people just didn't prefer that mechanism for working.

Doing that isn't really much different than a virutal environment though. The only real difference is that in a virtual environment you essentially have "named" (by file system path) sets of dependencies that are automatically "activated" when you start up the Python interpreter. In the setuptools/bundler style you have in memory sets of dependencies that are activated by calling a particular API, often done automatically via a binstub.

donaldstufft | 11 years ago | on: Incremental Plans to Improve Python Packaging

Oh, and to be clear. When I argued for PEP 453 I was very explicitly against doing anything that meant pip wasn't upgradeable on it's own outside of the standard library release cycle.

donaldstufft | 11 years ago | on: Incremental Plans to Improve Python Packaging

distutils being built into the stdlib means that it's not very easy to improve the tooling by improving that module, since it's tied to the Python release and people can't depend on a new python release for many years.

setuptools isn't tied to the stdlib, though it had many problems and still does. A large portion of what was holding back improvements was that there was nobody really pushing through all the political nonsense surrounding the tooling, and there was nobody to say Yes/No when a consensus couldn't be reached. For normal featured there was the PEP process, but the PEP process didn't work for a long time for packaging because Guido admits he really doesn't care much about packaging at all. Now that we have BDFL-Delegates in the form of Nick Coghlan and Richard Jones and we have people willing to push through changes even when it takes a lot of pain to argue the points we're finally seeing the engine of progress start to grind to start.

donaldstufft | 11 years ago | on: Incremental Plans to Improve Python Packaging

With Python 3.3 the venv module is now part of the standard library and the interpreter itself has been modified to support the isolation that the virtualenv had to use hacks to make happen. In Python 3.4 the venv module installs pip by default as well.

As one of the maintainers of virtualenv, it's my goal to move that project so that it will use the venv isolation mechanism when it is available, and have virtualenv just provide a level of UX overtop of it as well as shims for versions of Python that don't have the venv module.

donaldstufft | 12 years ago | on: Python 3.4.0 released

Assuming we've released a newer version of pip when 3.4.1 is released :)

donaldstufft | 12 years ago | on: Python 3.4.0 released

Both kind of!

Added to the stdlib was "ensurepip", which is a simple installer for pip. The ensurepip module includes inside of it a copy of pip that it will install from (in other words, ensurepip doesn't hit the network).

There are various reasons why it does this, part of which is to enable easy upgrades to newer versions of pip both inside of CPython itself, and for the end user to upgrade it locally.

donaldstufft | 12 years ago | on: Please reconsider the Boolean evaluation of midnight

You can't document your way out of a usability problem.

donaldstufft | 12 years ago | on: Please reconsider the Boolean evaluation of midnight

It's midnight without a date, however if there is a timezone attached to the time then it's midnight utc unless the utc offset of the time is a negative value, then it's never.

donaldstufft | 12 years ago | on: Please reconsider the Boolean evaluation of midnight

To take it from the same James Coglan -> "You can't add midnight to 3-o'clock in the same way you can't add London to Chicago."

donaldstufft | 12 years ago | on: Redesigned Python.org

The entire site is Open Source. If you rely on obscurity for security then you've done it wrong.

VPN and the like would be nice but not hardly required.

donaldstufft | 12 years ago | on: Python on Wheels

A great deal of the pain that Armin experienced is mostly due to the fact that Wheels are very new and aren't very polished yet. Additionally there is a vast wealth of technical debt in the packaging tools so it's quite easy to introduce these kinds of problems because of the general architecture (Something that we're trying to fix long term!).