> In the last 3 years, about 50k signatures had been uploaded to PyPI by 1069 unique keys. Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19)
Why not include the public key in the package?
99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.
>Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures.
I don't know if it applies to any of those 1069 keys, but note that there is a way of hosting PGP keys that does not depend on key servers: WKD https://datatracker.ietf.org/doc/draft-koch-openpgp-webkey-s... . You host the key at a .well-known URI under the domain of the email address. It's a draft as you can see, but I've seen a few people using it (including myself), and GnuPG supports it.
This is interesting, but it doesn't really solve the key distribution problem: with well-known hosting you now have a (weak) binding to some DNS path, but you're not given any indication of how to discover that path. It's also not clear that a DNS identity is beneficial to PyPI in particular (since PyPI doesn't associate or namespace packages at all w/r/t DNS).
More generally, these kinds of improvements are not a sufficient reason to retain PGP: even with a sensible identity binding and key distribution, PGP is still a mess under the hood. The security of a codesigning scheme is always the weakest signature and/or key, and PGP's flexibility more or less ensures that that weakest link will always be extremely weak.
Isn't that throwing out the baby with the bathwater? There seem to be non-neglible risks of installing malware from PyPI according to various headlines recently. But instead of improving security measures that don't work well they just remove them?
Removing security features that don't work is a separate concern from making security features that do work. Nobody who has done any serious work on PyPI security in the past 15 years thinks that GPG will play a part in the future of PyPI security. It's support was entirely vestigial, served no practical purpose, and never would.
Most supply chain attacks rely on dependency confusion or typo-squatting, which PGP signing doesn't solve. An attacker can PGP sign their typosquatted package, and the package manager won't know to alert you because as far as it can tell, you intended to install that package. (This is before even considering whether the packages are signed with strong keys, or users are actually verifying them against any public trust store.) That's one reason supply chain issues are so pernicious - they're more of a human problem than a technical one.
That said, I do agree with your premise that the limited usefulness of PGP signing doesn't necessitate removing the feature entirely.
There are many cases where it's better to know you don't have something correctly than think you have something incorrectly. Security is certainly one.
So they examined everything uploaded to PyPi with a signature over three years, including old versions, and classify those packages whose signing key is expired today, possibly years later, as "impossible to meaningfully verify." Never mind that the package may have been verifiable with a valid key for a full year or two before the key expired, and in the meantime may have been superseded by a newer version.
They also say they can't "meaningfully verify" packages if the key does not have "binding identify information," by which they presumably mean automatically verifiable binding identity information, which usually means someone verified an email from keys.openpgp.org. This is a really narrow way to establish "binding identity information." For example someone who is a PyPi author and publicly links their PGP key from a (https) website on the same domain as the email on the key would not count. A well known longtime PyPi author with a well known key would not count.
The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.
This has the smell of "we didn't want to bother supporting PGP any more because it's hard so we came up with an excuse."
No need for an excuse, though: Just be honest about it and let the chips fall where they may, if you really don't want to support PGP. God knows there are valid reasons for not having the energy to deal with PGP. (FWIW I think it's a good solution for packages, for those who can navigate the tooling, but on the other hand I'm not volunteering my time to run PyPi.)
P.S. There is a link in their post saying PGP has "documented issues." The specific issue described in the linked document is "packaging signing is not the holy grail" and a list of known things about PGP, like that verification of keys is ad hoc. It also concludes that there is no known better alternative.
> The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.
This is revisionist: in 2005, PGP was approachingly modern and represented an acceptable tradeoff between usability, legal and patent constraints, and arms laws. It was also accompanied by a network of synchronizing keyservers and a "strong set" within the Web of Trust that, in principle, gave you transitive authenticity for artifacts. That never really worked as expected, but it's all code and infrastructure that was actually running in 2005, when PyPI chose to allow PGP signatures.
None of that is the case in 2023: PGP is 20 years behind cryptographic best practices, and has 30 years of unresolved technical debt. There is no web of trust, and the synchronizing keyserver network has been broken for years.
The argument for PGP in 2005 was that it was, to a first approximation, the best that could be done. The argument against PGP in 2023 is that, to a first approximation, it's worse than useless by virtue of providing false security guarantees.
> Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19) 2.
so...*reject those packages*. if you use a PGP key that isn't properly available or verifiable, reject it. That way every package with a PGP key will have 100% "key is properly discoverable" rate.
it's not really reasonable to just drop this feature because most packages don't use it. Packages with tens of millions of downloads (like mine) make up a small percentage of total packages, but this small number of packages makes up a huge proportion of actual downloads, and package signing is most useful for these kinds of packages.
if the adoption of "proper PGP keys" were ranked by packages/ downloads rather than "packages" alone, these rates would be much different.
Looking at the top 20 packages in the last month by download (packages with hundreds of millions of downloads), only 1 of them shipped a GPG signature with their most recent release. I haven't asked the author of that one, but I do know them and I suspect they agree with the idea that it's not a valuable thing and they do it largely because it exists.
I don't understand how Java can get this right with Maven Central and co but newer languages can't.
Having a slight barrier to entry which is essentially "you must learn why signing is important for users of your library and this is how to do it", a) really isn't that bad and b) doesn't result in less quality packages being uploaded c) if it acts like any sort of filter that seems to be a good thing.
Maven Central isn't short of high quality packages and no high quality OSS Java libraries are missing so the filter aspect isn't culling anything important.
Java, Apt, RPM, etc all have this and have absolutely gigantic numbers of packages so the argument that it's too hard really just doesn't hold water.
I don't believe that Maven Central's use of GPG is providing a meaningful security control here, so I would dispute the idea that they're doing it "right".
Now that you have removed GPG ASC signature upload support, is there any way for publishers to add cryptographic signatures to packages that they upload to pypi? FWIU only "the server signs uploads" part of TUF was ever implemented?
Why do we use GPG ASC signatures instead of just a checksum over the same channel?
> Why do we use GPG ASC signatures instead of just a checksum over the same channel?
Could you elaborate on what you mean by this? PyPI computes and supplies a digest for every uploaded distribution, so you can already cross-check integrity for any hosted distribution.
GPG was nominally meant to provide authenticity for distributions, but it never really served this purpose. That's why it's being removed.
When many developers didn't use 2FA they pushed for them to enable 2FA within a deadline. It sounds like the same approach could have been used for PyPI. E.g.: an attempt to make the feature useful before declaring it dead forever.
This has very little to do with 2FA: PGP signing has been de facto dead for years on PyPI, and this change has no effect on publishing workflows: PyPI will still accept uploads that contain signatures, and just ignores them now.
It's also not accurate to say that PyPI failed to make 2FA useful: it was deployed for over two years before the 2FA mandate for critical projects went into effect. That mandate also came with free hardware keys for everyone affected.
I came here thinking they were removing the PGP package from PyPi, but they're just removing a barely-used signature system? I don't know why they have to remove it though. I doubt it requires much maintenance now that it's already in place.
Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.
> While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.
At least you can't blame pypi for ignoring the report, and tbh I find this response time remarkably quick. It wouldn't have been far fetched to imagine someone in their position just trying to ignore/downplay/dispute this sort of reports.
I don't understand the argument. Isn't the whole point of PGP establishing some kind of chain of trust? If pypi.org has it's public key, it could sign a few major distributors's keys, and for smaller/individual packages I could either choose to always trust the same public key or don't use the package. It's not a centralized system to begin with. It's not pypi.org's responsibility to identify and verify all the keys belong to who say they belong. Pypi.org's unable to verify individual identities shouldn't impact the overall usefulness of the PGP for package distribution and verification.
Interesting timeline. The Yossarian article that TFA cites and that I assumed was the impetus here was published two days ago on 5/21. But the audit was two days earlier on 5/19.
I originally ran the audit on 3/27 (IIRC), and then ran it a few additional times as I fixed data quality issues in my scripts (the ones linked in the post). The last time I ran it was on 5/19-5/20, when I was finalizing the post. You can also see that I did a new release of `pgpkeydump` at around the same time, to add some more extracted datapoints.
PyPI's admins have been wanting to remove PGP support for years; all I did was provide the final nudge.
I have been thinking about this in the context of Java libraries (really using Scala, but bear with me).
If the repo requires a GPG signature, they could also ask for the public key of the developer making the releases (e.g. when they make the account), and they could sign it with their key at that point.
Then make available the package, the signature, and the signed public key. Then I only need to trust the repo's key (in this case PyPi).
It makes sense in terms of trusting the package index, but it's inverted from the original design goal: the point of end-user signatures on package indices is to eliminate unnecessary package index trust, not reinforce it.
If you already trust the package index, then mandating HTTPS and strong cryptographic digests is going to be far more effective (and secure) than some kind of PGP key attestation scheme.
Just for disambiguation: ECDSA is a signing algorithm, not a protocol or toolkit like PGP. PGP can produce ECDSA signatures through an extension RFC, but it's not a core part of OpenPGP.
There is no immediate replacement, because the overwhelming majority of packages never bothered to sign with PGP (and all evidence points to the overwhelming majority of signatures never being verified). In other words, this is much closer to removing "dead" code than to killing an active feature.
Longer term, the plan is to integrate Sigstore[1]-based signatures.
PGP is a solution in search of a problem. We have given it decades for it to be useful and it turns out that it is an enormous security failure. It needed to go.
Sigstore [0] on the other hand makes more sense to use instead of problem.
This reads like an advertisement. I routinely use GPG, and it is useful for me. It's not perfect (far from perfect, really), but it's a solution for multiple of my problems.
I don't know much about the solution you promote, but as usual with many "PGP killers" it replaces one very specific application of PGP and ignores all the others. Which is ok! Doing one thing and doing it well is the Unix philosophy after all. But it's not something I have use for, and it's not a viable replacement for GPG.
Trust on first use is absolutely a valid use of PGP signatures that is being used in many real world systems (ask me how I know). You finding that PGP isn't being used they way you think it should does not justify removing it without providing a replacement.
Why on earth wasn't the community asked before you implemented this change?
> Given all of this, the continued support of uploading PGP signatures to PyPI is no longer defensible. While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.
[+] [-] tzs|2 years ago|reply
Why not include the public key in the package?
99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.
[+] [-] Arnavion|2 years ago|reply
I don't know if it applies to any of those 1069 keys, but note that there is a way of hosting PGP keys that does not depend on key servers: WKD https://datatracker.ietf.org/doc/draft-koch-openpgp-webkey-s... . You host the key at a .well-known URI under the domain of the email address. It's a draft as you can see, but I've seen a few people using it (including myself), and GnuPG supports it.
[+] [-] woodruffw|2 years ago|reply
More generally, these kinds of improvements are not a sufficient reason to retain PGP: even with a sensible identity binding and key distribution, PGP is still a mess under the hood. The security of a codesigning scheme is always the weakest signature and/or key, and PGP's flexibility more or less ensures that that weakest link will always be extremely weak.
[+] [-] Avamander|2 years ago|reply
[+] [-] usr1106|2 years ago|reply
[+] [-] donaldstufft|2 years ago|reply
[+] [-] chatmasta|2 years ago|reply
That said, I do agree with your premise that the limited usefulness of PGP signing doesn't necessitate removing the feature entirely.
[+] [-] masklinn|2 years ago|reply
That assumes there’s a baby in the bath water.
> But instead of improving security measures that don't work well they just remove them?
Well yes, “security measures” which don’t work are usually worse than nothing.
[+] [-] Brian_K_White|2 years ago|reply
[+] [-] eduction|2 years ago|reply
They also say they can't "meaningfully verify" packages if the key does not have "binding identify information," by which they presumably mean automatically verifiable binding identity information, which usually means someone verified an email from keys.openpgp.org. This is a really narrow way to establish "binding identity information." For example someone who is a PyPi author and publicly links their PGP key from a (https) website on the same domain as the email on the key would not count. A well known longtime PyPi author with a well known key would not count.
The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.
This has the smell of "we didn't want to bother supporting PGP any more because it's hard so we came up with an excuse."
No need for an excuse, though: Just be honest about it and let the chips fall where they may, if you really don't want to support PGP. God knows there are valid reasons for not having the energy to deal with PGP. (FWIW I think it's a good solution for packages, for those who can navigate the tooling, but on the other hand I'm not volunteering my time to run PyPi.)
P.S. There is a link in their post saying PGP has "documented issues." The specific issue described in the linked document is "packaging signing is not the holy grail" and a list of known things about PGP, like that verification of keys is ad hoc. It also concludes that there is no known better alternative.
[+] [-] woodruffw|2 years ago|reply
This is revisionist: in 2005, PGP was approachingly modern and represented an acceptable tradeoff between usability, legal and patent constraints, and arms laws. It was also accompanied by a network of synchronizing keyservers and a "strong set" within the Web of Trust that, in principle, gave you transitive authenticity for artifacts. That never really worked as expected, but it's all code and infrastructure that was actually running in 2005, when PyPI chose to allow PGP signatures.
None of that is the case in 2023: PGP is 20 years behind cryptographic best practices, and has 30 years of unresolved technical debt. There is no web of trust, and the synchronizing keyserver network has been broken for years.
The argument for PGP in 2005 was that it was, to a first approximation, the best that could be done. The argument against PGP in 2023 is that, to a first approximation, it's worse than useless by virtue of providing false security guarantees.
[+] [-] tptacek|2 years ago|reply
[+] [-] rwmj|2 years ago|reply
[+] [-] zzzeek|2 years ago|reply
so...*reject those packages*. if you use a PGP key that isn't properly available or verifiable, reject it. That way every package with a PGP key will have 100% "key is properly discoverable" rate.
it's not really reasonable to just drop this feature because most packages don't use it. Packages with tens of millions of downloads (like mine) make up a small percentage of total packages, but this small number of packages makes up a huge proportion of actual downloads, and package signing is most useful for these kinds of packages.
if the adoption of "proper PGP keys" were ranked by packages/ downloads rather than "packages" alone, these rates would be much different.
[+] [-] donaldstufft|2 years ago|reply
Looking at the top 20 packages in the last month by download (packages with hundreds of millions of downloads), only 1 of them shipped a GPG signature with their most recent release. I haven't asked the author of that one, but I do know them and I suspect they agree with the idea that it's not a valuable thing and they do it largely because it exists.
[+] [-] jpgvm|2 years ago|reply
Having a slight barrier to entry which is essentially "you must learn why signing is important for users of your library and this is how to do it", a) really isn't that bad and b) doesn't result in less quality packages being uploaded c) if it acts like any sort of filter that seems to be a good thing.
Maven Central isn't short of high quality packages and no high quality OSS Java libraries are missing so the filter aspect isn't culling anything important.
Java, Apt, RPM, etc all have this and have absolutely gigantic numbers of packages so the argument that it's too hard really just doesn't hold water.
Doing so requires reading/understanding these ~3 pages of docs: https://central.sonatype.org/publish/requirements/gpg/
[+] [-] B1FF_PSUVM|2 years ago|reply
Python (1991) is older than Java (1995)
(irrelevant factoid, but still ...)
[+] [-] donaldstufft|2 years ago|reply
[+] [-] blibble|2 years ago|reply
it's the magic combination of pushing their own agenda (vs. that of their users), mixed with ineptitude
[+] [-] westurner|2 years ago|reply
Why do we use GPG ASC signatures instead of just a checksum over the same channel?
[+] [-] woodruffw|2 years ago|reply
Could you elaborate on what you mean by this? PyPI computes and supplies a digest for every uploaded distribution, so you can already cross-check integrity for any hosted distribution.
GPG was nominally meant to provide authenticity for distributions, but it never really served this purpose. That's why it's being removed.
[+] [-] ilyt|2 years ago|reply
Of course, if you haven't put any effort in system to end-to-end verify whether it's right signature it doesn't matter.
[+] [-] jwilk|2 years ago|reply
[+] [-] WhyNotHugo|2 years ago|reply
[+] [-] woodruffw|2 years ago|reply
It's also not accurate to say that PyPI failed to make 2FA useful: it was deployed for over two years before the 2FA mandate for critical projects went into effect. That mandate also came with free hardware keys for everyone affected.
[+] [-] masklinn|2 years ago|reply
Distributions don’t use it, therefore it’s worthless, just just overhead and technical debt.
[+] [-] bryanlyon|2 years ago|reply
Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.
[+] [-] tedivm|2 years ago|reply
> While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.
[+] [-] Avamander|2 years ago|reply
Discoverable. That does not really verify anything about the key, its identities or the supposed signer.
It boils down to almost entirely to just an overcomplicated hashing system.
[+] [-] zokier|2 years ago|reply
[+] [-] masklinn|2 years ago|reply
The post gave them excellent additional justification to.
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] jxy|2 years ago|reply
[+] [-] NotYourLawyer|2 years ago|reply
[+] [-] woodruffw|2 years ago|reply
PyPI's admins have been wanting to remove PGP support for years; all I did was provide the final nudge.
[+] [-] reidrac|2 years ago|reply
If the repo requires a GPG signature, they could also ask for the public key of the developer making the releases (e.g. when they make the account), and they could sign it with their key at that point.
Then make available the package, the signature, and the signed public key. Then I only need to trust the repo's key (in this case PyPi).
Does this make any sense?
[+] [-] woodruffw|2 years ago|reply
It makes sense in terms of trusting the package index, but it's inverted from the original design goal: the point of end-user signatures on package indices is to eliminate unnecessary package index trust, not reinforce it.
If you already trust the package index, then mandating HTTPS and strong cryptographic digests is going to be far more effective (and secure) than some kind of PGP key attestation scheme.
[+] [-] KRAKRISMOTT|2 years ago|reply
[+] [-] woodruffw|2 years ago|reply
There is no immediate replacement, because the overwhelming majority of packages never bothered to sign with PGP (and all evidence points to the overwhelming majority of signatures never being verified). In other words, this is much closer to removing "dead" code than to killing an active feature.
Longer term, the plan is to integrate Sigstore[1]-based signatures.
[1]: https://www.sigstore.dev/
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] jossclimb|2 years ago|reply
[+] [-] sacnoradhq|2 years ago|reply
FWIW, Ruby also did a piss-poor job of handling gem signing by making it both difficult and optional.
How fucking hard is it to get to the level of code release assurance as Debian or Fedora? Manage GPG keys, signfest them, and enforce a policy.
[+] [-] rvz|2 years ago|reply
Sigstore [0] on the other hand makes more sense to use instead of problem.
[0] https://www.sigstore.dev
[+] [-] msm_|2 years ago|reply
I don't know much about the solution you promote, but as usual with many "PGP killers" it replaces one very specific application of PGP and ignores all the others. Which is ok! Doing one thing and doing it well is the Unix philosophy after all. But it's not something I have use for, and it's not a viable replacement for GPG.
[+] [-] LtWorf|2 years ago|reply
We will instead switch to use some thing with a fluffy corporate website that tells absolutely nothing.
[+] [-] 7to2|2 years ago|reply
Why on earth wasn't the community asked before you implemented this change?
> Given all of this, the continued support of uploading PGP signatures to PyPI is no longer defensible. While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.
This uninformed reasoning is what's indefensible.
[+] [-] forgotmypw17|2 years ago|reply
One to compile a list of file hashes and PGP-sign them.
One to validate these hashes against the provided signatures.