donaldstufft | 2 years ago | on: PyPI will require 2FA by the end of 2023
donaldstufft's comments
donaldstufft | 2 years ago | on: PyPI will require 2FA by the end of 2023
That can be annoying to keep in sync if you have a lot of projects, but we're rolling out organization support to make that easier for people.
donaldstufft | 2 years ago | on: PyPI will require 2FA by the end of 2023
While we strive to make PyPI useful for everyone we totally understand that sometimes the trade offs we have to make just don't work for everyone so we try really hard to enable folks like yourself to be able to set up their own repositories. I'm glad that it's working out for you and that you've got a setup you like.
I do want to mention two things:
We've got a PEP (PEP 708) going through the works that will tighten the security model around multiple repositories down some more. If I understand your uses well enough you should be able to add a line or two of HTML to your repository and not have any interruptions or warnings. That PEP isn't accepted yet or implemented or anything, but something to keep in the back of your mind at least.
While we don't make any sort of raw download logs available, we do publish what is essentially a query-able database of download events that have been parsed already to make it easy to see those stats. We do have a little bit of redaction on those events, primarily to avoid leaking PII like IP addresses and such, where instead of an IP address we log broad geographical area (country I think?).
If anyone is curious to see that, it's hosted in Google BigQuery (sorry, it does require a Google account) and there's a guide at https://packaging.python.org/en/latest/guides/analyzing-pypi... that tells you more about it.
donaldstufft | 2 years ago | on: PyPI will require 2FA by the end of 2023
I do feel want to mention though (largely because I think it's pretty cool), that those security devices are using a private/public key system under the covers, and they're actually designed to be privacy friendly and phishing resistant. One of the problems with TOTP based 2FA is that since it's asking users to type the TOTP code into the website, they can be phished and tricked into typing their password and TOTP code into an attacker's website, who then quickly go and use it to sign onto their account.
Those hardware tokens prevent that phishing from happening. They basically create, on the fly, a public/private key pair that is bound to the domain name of the site in question, and then give the public key of that to the site. When you come back to log in again, the site tells the hardware token what public key it has, the token looks a the site's domain and determines if it has that key for that domain, and if it does it uses a signature to prove ownership of the private key.
It all ends up working really well, since the domain name (actually the protocol, domain name, and port) is part of the identify of the key pair, it is impossible for it to get entered on the wrong site, so it completely eliminates phishing. Then since every single site gets it's own brand new keypair generated for it, there's no way to determine that the hardware token used on Site A is the same as the hardware token used on Site B. So it's entirely privacy preserving as well!
The protocol is obviously a bit more complicated then that, but that's the general idea of it.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
On top of all of that, it's volunteer run and has been understaffed for basically it's entire life, so sitting down and figuring out a proper data retention policy that takes a holistic view of everything we have just never bubbled up.
In general I think we already do a pretty good job of collecting a minimal amount of data, and hopefully with proper policies we can do an even better job.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
Firstly, Debian's mirror network URLs allow a mirror operator to attack the base Debian.org site if they rely on cookies on debian.org (they may not, I'm not sure). Specifically the `ftp.<country>.debian.org` aliases cause this. On PyPI we did use cookies at the base url, so this was a non starter for us to keep.
The second thing here is that Debian and PyPI from a technical level about how mirrors are configured and hosted are generally similar. Meaning other than the above aliases, mirrors are expected to have their own domain and users are expected to configure apt or pip to point to a specific domain. Debian does have a command that will attempt to do that configuration for you to, to make it easier.
The third thing is that Debian's mirrors are as secure as the main repository is against attacks from a compromised mirror operator. This isn't the case in PyPI where you're forced to trust the mirror operator to serve you the correct packages. There is vestigal support for a scheme to support this in the mirroring PEP, but nothing ever really implemented it except the very old version of PyPI (none of the clients, etc). That scheme is also very insecure, so it doesn't really provide the security levels it was intended to.
The fourth thing is that a Debian mirror is easier to operate.
Packages on Debian don't live forever, as new versions are released old versions get removed, and as OS releases move into end of life, entire chunks of packages get rotated out. However on PyPI we don't have the concept of an OS release, or any sort of phasing out of old packages. All packages are valid for as long as the author makes them available. This means that the storage space to run a PyPI mirror (currently ~30TB) is a lot more than the storage space for a Debian mirror (~4TB).
On top of that the way apt and pip function are inherently different. Apt has users occasionally download the entire package set so that apt has a local copy of the metadata while pip asks the server for each package for the metadata (it does some light caching, but not a lot). This means that to discover what packages are available, apt might make one request a day while pip might make 100 requests for every invocation of pip. Packages on apt release a lot slower and less often than on pip. so many times people may not be needing to download more than a handful of packages, but people generally need to download a lot of packages from PyPI at a time.
I believe? the Debian mirroring protocol is rsync based, which is generally pretty reliable, while the PyPI mirroring protocol is a custom one which works, but it sometimes has a tendency to get "stuck" every few months and require operators to notice and fix themselves.
I suspect the differences between the strength of the mirror network is some combination of the two, but I suspect the the third and fourth things are the biggest differences, particularly when PyPI's CDN solved the problem in most users minds that would cause them to want to host or use a mirror.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
The requirement for having individual keys signed by Debian Developers just makes it easier for the archive administrators to decipher which keys they want to add to their root of trust. The upload system does not check those signatures at all, they do not need to exist in the slightest as far as the upload system is concerned.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
But let's step back a moment and presume that they do have that ability to compel. The first step here is that none of the PyPI Administrators are the legal owners of PyPI, so such an order would not be sent to any of us, but rather to the PSF itself. The PSF would then be on the hook to either comply or fight said hypothetical order, but individual members of the administration team would not be, and would be free to quit. They may not be able to say why they've quit, but quitting AFAIK would be entirely possible.
The PSF, while not having Apple's war chest, does retain counsel for dealing with things like this, and I can say personally I'd spend myself broke before I'd be willing to do so.
We are going to be implementing signing, and I'm hoping we'll be able to make strong progress on that soon.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
This is not how signing works in Debian at a technical level. At at technical level uploading to Debian requires them to add your key to a list of keys maintained by the archive administrators. As a matter of policy those administrators ask you to get your key signed by an existing Debian Developer, but at no point does their upload infrastructure check that or use the Web of Trust.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
PyPI still fully supports mirrors (though it is becoming increasingly hard to run a full mirror of PyPI, last I looked a full copy of PyPI is about 30TB).
The only thing we ever removed was designating any particular mirror as official and an auto discovery protocol that was quite frankly extremely insecure and slow. That worked by giving every single mirror that wanted to be an "official" mirror for auto discovery a subdomain of `pypi.python.org`, labeled {a-z}.pypi.python.org. A client would determine what mirrors were available by querying last.pypi.python.org, which was a CNAME pointing to the last letter that we had assigned, that would tell it how many mirrors there were, then they could work backwards from that letter. So if the CNAME pointed to c.pypi.python.org, the client would know that a, b, and c existed.
Immediately you should be able to see a few problems with this:
- It is grossly insecure. Subdomains of a domain can set cookies on the parent domain, depending on ~things~ they can also read cookies.
- It does not scale past having 26 mirrors.
- It does not support removing a mirror, there can be no gaps in the letters.
So we needed to remove that auto discovery mechanism, which raised the question of what, if anything, we should replace it with?
Well at the time we had only ever made it up to g.pypi.python.org. So there was only 7 total mirrors that ever asked to become an official mirror. To my knowledge we never reused a letter, if a mirror went away we would just point the mirror back at the main PyPI instance. I don't remember exactly, but my email references there being only 4 mirrors left.
From my memory at the time, most of those 4 mirrors were regularly hours or days behind PyPI, would regularly go offline, etc.
But again, we never stopped anyone from running a mirror, we just removed the auto discovery mechanism and required them get their own domain name. We even linked to a third party site that would index all of the servers and keep track of how "fresh" they were, and other stats (at least until that site went away).
Running a mirror of PyPI is a non trivial undertaking, and most people simply don't want to do that. We never had many mirrors of PyPI running, and as it turns out once we improved PyPI most people decided they simply didn't care to use a mirror and preferred to just use PyPI, but still to this day we support anyone to mirror us.
donaldstufft | 2 years ago | on: PyPI Was Subpoenaed
I haven't explicitly asked, but I would be very surprised if any of the other PyPI admins felt differently.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
donaldstufft | 2 years ago | on: Removing PGP from PyPI
PyPI should implement it though, because fundamentally the question of who is authorized to release for "requests" on PyPI is a question of who PyPI authorizes to release for that.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
In the cases that it is used, AFAIK it is only used by Debian's uscan program, which is sort of like the Debian version of Dependabot, it tells them when there is a new version of something to package. As far as I know, the process of packaging that new version is still manual, and relies on the maintainer downloading the package and packaging it, so they may or may not use the signature in that case.
How useful this is, is up for debate. Many years ago when I first started taking over releasing pip, that caused the pip GPG key to change, and the reaction of the Debian maintainer at the time was to just comment out the signature bit and fall back to no signature.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
donaldstufft | 2 years ago | on: Removing PGP from PyPI
There's also a general consensus (not documented) that sigstore will play some kind of role here. Possibly in-toto as well?
In the 10 years since my post that you referenced, we've laid some decent plans I believe, and have just slowly been working on them, to the extent that we've been able to given our own time constraints.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
PyPI Administrator here, and the person who removed GPG from PyPI.
All the way back in 2013 I had written blog posts that talked about how GPG was not sufficient for a secure package signing scheme in a repository.
I first proposed removing GPG back in May of 2016 (turns out May is a bad month for GPG in my world). At that time we were knee deep in rewriting PyPI into it's modern incanation and trying to quickly identify what features were actually important enough to keep in the new implementation and what features were not.
Even back in 2016 I did not think that the level of use of GPG and the relative uselessness of the signatures made sense to keep it as a feature. However when I proposed it we got some small amount of push back, primarily from Linux distributors, and the feature had already been implemented so we just removed it from the UI and left the feature in. This wasn't an endorsement of the feature, but rather a tactical choice that it wasn't worth spending more time on removing GPG at that point when we were focused on the rewrite.
In the intervening years it had periodically come up, everyone had agreed that it wasn't part of our long term plans, but nobody had the time to dig into figuring out if the signatures that were being uploaded were actually useful and without that, there was some vague concern that maybe somewhere out there some system might be relying on them, and not wanting to "pick a fight" over it at that time.
Then woodruffw did the work to investigate how useful the existing signatures actually were, and quite frankly the numbers were worse than I expected. I honestly expected most of the existing signatures to be meaningfully verifiable, because from my perspective, the only people left signing were likely going to be people who were invested in GPG, and thus more likely to spend the time to make sure that everything was working.
Given that new information, along with a long desire (over 7-10+ years now!) to remove this small bit of security theater, I went ahead and threw together a pull request to actually do it now. Like a lot of things in OSS, it was a perfect storm of someone pointing out a problem to someone who had enough time and motivation at that point in time to fix it that made that particular task bubble up to the top of my long TODO list.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
Implementing those things takes time though.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
However sigstore does not solve the question of ensuring that a package is coming from the person(s) you expect it to.
donaldstufft | 2 years ago | on: Removing PGP from PyPI
If some class of users cannot use whatever signing solution we come up with, then we'll figure out an option for them or we'll scrap the solution completely.