Removing PGP from PyPI

[+] tzs|2 years ago|reply

> In the last 3 years, about 50k signatures had been uploaded to PyPI by 1069 unique keys. Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19)

Why not include the public key in the package?

99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.

[+] Arnavion|2 years ago|reply

>Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures.

I don't know if it applies to any of those 1069 keys, but note that there is a way of hosting PGP keys that does not depend on key servers: WKD https://datatracker.ietf.org/doc/draft-koch-openpgp-webkey-s... . You host the key at a .well-known URI under the domain of the email address. It's a draft as you can see, but I've seen a few people using it (including myself), and GnuPG supports it.

[+] woodruffw|2 years ago|reply

This is interesting, but it doesn't really solve the key distribution problem: with well-known hosting you now have a (weak) binding to some DNS path, but you're not given any indication of how to discover that path. It's also not clear that a DNS identity is beneficial to PyPI in particular (since PyPI doesn't associate or namespace packages at all w/r/t DNS).

More generally, these kinds of improvements are not a sufficient reason to retain PGP: even with a sensible identity binding and key distribution, PGP is still a mess under the hood. The security of a codesigning scheme is always the weakest signature and/or key, and PGP's flexibility more or less ensures that that weakest link will always be extremely weak.

[+] Avamander|2 years ago|reply

You can't do WKD with just signatures, there are no identities associated with the signature to just look up.

[+] usr1106|2 years ago|reply

Isn't that throwing out the baby with the bathwater? There seem to be non-neglible risks of installing malware from PyPI according to various headlines recently. But instead of improving security measures that don't work well they just remove them?

[+] donaldstufft|2 years ago|reply

Removing security features that don't work is a separate concern from making security features that do work. Nobody who has done any serious work on PyPI security in the past 15 years thinks that GPG will play a part in the future of PyPI security. It's support was entirely vestigial, served no practical purpose, and never would.

[+] chatmasta|2 years ago|reply

Most supply chain attacks rely on dependency confusion or typo-squatting, which PGP signing doesn't solve. An attacker can PGP sign their typosquatted package, and the package manager won't know to alert you because as far as it can tell, you intended to install that package. (This is before even considering whether the packages are signed with strong keys, or users are actually verifying them against any public trust store.) That's one reason supply chain issues are so pernicious - they're more of a human problem than a technical one.

That said, I do agree with your premise that the limited usefulness of PGP signing doesn't necessitate removing the feature entirely.

[+] masklinn|2 years ago|reply

> Isn't that throwing out the baby with the bathwater?

That assumes there’s a baby in the bath water.

> But instead of improving security measures that don't work well they just remove them?

Well yes, “security measures” which don’t work are usually worse than nothing.

[+] Brian_K_White|2 years ago|reply

There are many cases where it's better to know you don't have something correctly than think you have something incorrectly. Security is certainly one.

[+] eduction|2 years ago|reply

So they examined everything uploaded to PyPi with a signature over three years, including old versions, and classify those packages whose signing key is expired today, possibly years later, as "impossible to meaningfully verify." Never mind that the package may have been verifiable with a valid key for a full year or two before the key expired, and in the meantime may have been superseded by a newer version.

They also say they can't "meaningfully verify" packages if the key does not have "binding identify information," by which they presumably mean automatically verifiable binding identity information, which usually means someone verified an email from keys.openpgp.org. This is a really narrow way to establish "binding identity information." For example someone who is a PyPi author and publicly links their PGP key from a (https) website on the same domain as the email on the key would not count. A well known longtime PyPi author with a well known key would not count.

The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.

This has the smell of "we didn't want to bother supporting PGP any more because it's hard so we came up with an excuse."

No need for an excuse, though: Just be honest about it and let the chips fall where they may, if you really don't want to support PGP. God knows there are valid reasons for not having the energy to deal with PGP. (FWIW I think it's a good solution for packages, for those who can navigate the tooling, but on the other hand I'm not volunteering my time to run PyPi.)

P.S. There is a link in their post saying PGP has "documented issues." The specific issue described in the linked document is "packaging signing is not the holy grail" and a list of known things about PGP, like that verification of keys is ad hoc. It also concludes that there is no known better alternative.

[+] woodruffw|2 years ago|reply

> The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.

This is revisionist: in 2005, PGP was approachingly modern and represented an acceptable tradeoff between usability, legal and patent constraints, and arms laws. It was also accompanied by a network of synchronizing keyservers and a "strong set" within the Web of Trust that, in principle, gave you transitive authenticity for artifacts. That never really worked as expected, but it's all code and infrastructure that was actually running in 2005, when PyPI chose to allow PGP signatures.

None of that is the case in 2023: PGP is 20 years behind cryptographic best practices, and has 30 years of unresolved technical debt. There is no web of trust, and the synchronizing keyserver network has been broken for years.

The argument for PGP in 2005 was that it was, to a first approximation, the best that could be done. The argument against PGP in 2023 is that, to a first approximation, it's worse than useless by virtue of providing false security guarantees.

[+] tptacek|2 years ago|reply

There are much, much better solutions for packaging!

[+] rwmj|2 years ago|reply

So they're removing PGP signatures, which certainly have some issues, and replacing them with ... nothing?

[+] zzzeek|2 years ago|reply

> Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19) 2.

so...*reject those packages*. if you use a PGP key that isn't properly available or verifiable, reject it. That way every package with a PGP key will have 100% "key is properly discoverable" rate.

it's not really reasonable to just drop this feature because most packages don't use it. Packages with tens of millions of downloads (like mine) make up a small percentage of total packages, but this small number of packages makes up a huge proportion of actual downloads, and package signing is most useful for these kinds of packages.

if the adoption of "proper PGP keys" were ranked by packages/ downloads rather than "packages" alone, these rates would be much different.

[+] donaldstufft|2 years ago|reply

I don't believe they would.

Looking at the top 20 packages in the last month by download (packages with hundreds of millions of downloads), only 1 of them shipped a GPG signature with their most recent release. I haven't asked the author of that one, but I do know them and I suspect they agree with the idea that it's not a valuable thing and they do it largely because it exists.

[+] jpgvm|2 years ago|reply

I don't understand how Java can get this right with Maven Central and co but newer languages can't.

Having a slight barrier to entry which is essentially "you must learn why signing is important for users of your library and this is how to do it", a) really isn't that bad and b) doesn't result in less quality packages being uploaded c) if it acts like any sort of filter that seems to be a good thing.

Maven Central isn't short of high quality packages and no high quality OSS Java libraries are missing so the filter aspect isn't culling anything important.

Java, Apt, RPM, etc all have this and have absolutely gigantic numbers of packages so the argument that it's too hard really just doesn't hold water.

Doing so requires reading/understanding these ~3 pages of docs: https://central.sonatype.org/publish/requirements/gpg/

[+] B1FF_PSUVM|2 years ago|reply

> newer languages can't.

Python (1991) is older than Java (1995)

(irrelevant factoid, but still ...)

[+] donaldstufft|2 years ago|reply

I don't believe that Maven Central's use of GPG is providing a meaningful security control here, so I would dispute the idea that they're doing it "right".

[+] blibble|2 years ago|reply

> I don't understand how Java can get this right with Maven Central and co but newer languages can't.

it's the magic combination of pushing their own agenda (vs. that of their users), mixed with ineptitude

[+] westurner|2 years ago|reply

Now that you have removed GPG ASC signature upload support, is there any way for publishers to add cryptographic signatures to packages that they upload to pypi? FWIU only "the server signs uploads" part of TUF was ever implemented?

Why do we use GPG ASC signatures instead of just a checksum over the same channel?

[+] woodruffw|2 years ago|reply

> Why do we use GPG ASC signatures instead of just a checksum over the same channel?

Could you elaborate on what you mean by this? PyPI computes and supplies a digest for every uploaded distribution, so you can already cross-check integrity for any hosted distribution.

GPG was nominally meant to provide authenticity for distributions, but it never really served this purpose. That's why it's being removed.

[+] ilyt|2 years ago|reply

Signature tells you who signed it.

Of course, if you haven't put any effort in system to end-to-end verify whether it's right signature it doesn't matter.

[+] jwilk|2 years ago|reply

Two days ago: https://news.ycombinator.com/item?id=36021172 ("PGP signatures on PyPI: worse than useless", >150 comments)

[+] WhyNotHugo|2 years ago|reply

When many developers didn't use 2FA they pushed for them to enable 2FA within a deadline. It sounds like the same approach could have been used for PyPI. E.g.: an attempt to make the feature useful before declaring it dead forever.

[+] woodruffw|2 years ago|reply

This has very little to do with 2FA: PGP signing has been de facto dead for years on PyPI, and this change has no effect on publishing workflows: PyPI will still accept uploads that contain signatures, and just ignores them now.

It's also not accurate to say that PyPI failed to make 2FA useful: it was deployed for over two years before the 2FA mandate for critical projects went into effect. That mandate also came with free hardware keys for everyone affected.

[+] masklinn|2 years ago|reply

No. 2FA is a feature for pypi, and developers. The entire purpose of pgp sigs was external, it was for distributions to use.

Distributions don’t use it, therefore it’s worthless, just just overhead and technical debt.

[+] bryanlyon|2 years ago|reply

I came here thinking they were removing the PGP package from PyPi, but they're just removing a barely-used signature system? I don't know why they have to remove it though. I doubt it requires much maintenance now that it's already in place.

Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.

[+] tedivm|2 years ago|reply

They address your comment directly in their post-

> While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.

[+] Avamander|2 years ago|reply

> Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.

Discoverable. That does not really verify anything about the key, its identities or the supposed signer.

It boils down to almost entirely to just an overcomplicated hashing system.

[+] zokier|2 years ago|reply

At least you can't blame pypi for ignoring the report, and tbh I find this response time remarkably quick. It wouldn't have been far fetched to imagine someone in their position just trying to ignore/downplay/dispute this sort of reports.

[+] masklinn|2 years ago|reply

As the author of the post noted above, the pypi maintainers have been wanting to get rid of pgp for awhile.

The post gave them excellent additional justification to.

[+] unknown|2 years ago|reply

[deleted]

[+] jxy|2 years ago|reply

I don't understand the argument. Isn't the whole point of PGP establishing some kind of chain of trust? If pypi.org has it's public key, it could sign a few major distributors's keys, and for smaller/individual packages I could either choose to always trust the same public key or don't use the package. It's not a centralized system to begin with. It's not pypi.org's responsibility to identify and verify all the keys belong to who say they belong. Pypi.org's unable to verify individual identities shouldn't impact the overall usefulness of the PGP for package distribution and verification.

[+] NotYourLawyer|2 years ago|reply

Interesting timeline. The Yossarian article that TFA cites and that I assumed was the impetus here was published two days ago on 5/21. But the audit was two days earlier on 5/19.

[+] woodruffw|2 years ago|reply

I originally ran the audit on 3/27 (IIRC), and then ran it a few additional times as I fixed data quality issues in my scripts (the ones linked in the post). The last time I ran it was on 5/19-5/20, when I was finalizing the post. You can also see that I did a new release of `pgpkeydump` at around the same time, to add some more extracted datapoints.

PyPI's admins have been wanting to remove PGP support for years; all I did was provide the final nudge.

[+] reidrac|2 years ago|reply

I have been thinking about this in the context of Java libraries (really using Scala, but bear with me).

If the repo requires a GPG signature, they could also ask for the public key of the developer making the releases (e.g. when they make the account), and they could sign it with their key at that point.

Then make available the package, the signature, and the signed public key. Then I only need to trust the repo's key (in this case PyPi).

Does this make any sense?

[+] woodruffw|2 years ago|reply

> Does this make any sense?

It makes sense in terms of trusting the package index, but it's inverted from the original design goal: the point of end-user signatures on package indices is to eliminate unnecessary package index trust, not reinforce it.

If you already trust the package index, then mandating HTTPS and strong cryptographic digests is going to be far more effective (and secure) than some kind of PGP key attestation scheme.

[+] KRAKRISMOTT|2 years ago|reply

What are we switching to? Does Pypi support ECDSA?

[+] woodruffw|2 years ago|reply

Just for disambiguation: ECDSA is a signing algorithm, not a protocol or toolkit like PGP. PGP can produce ECDSA signatures through an extension RFC, but it's not a core part of OpenPGP.

There is no immediate replacement, because the overwhelming majority of packages never bothered to sign with PGP (and all evidence points to the overwhelming majority of signatures never being verified). In other words, this is much closer to removing "dead" code than to killing an active feature.

Longer term, the plan is to integrate Sigstore[1]-based signatures.

[1]: https://www.sigstore.dev/

[+] unknown|2 years ago|reply

[deleted]

[+] jossclimb|2 years ago|reply

sigstore I hope.

[+] sacnoradhq|2 years ago|reply

So how are Python packages signed? Are they just shipping rando code without any sort of E2E assurance?

FWIW, Ruby also did a piss-poor job of handling gem signing by making it both difficult and optional.

How fucking hard is it to get to the level of code release assurance as Debian or Fedora? Manage GPG keys, signfest them, and enforce a policy.

[+] rvz|2 years ago|reply

PGP is a solution in search of a problem. We have given it decades for it to be useful and it turns out that it is an enormous security failure. It needed to go.

Sigstore [0] on the other hand makes more sense to use instead of problem.

[0] https://www.sigstore.dev

[+] msm_|2 years ago|reply

This reads like an advertisement. I routinely use GPG, and it is useful for me. It's not perfect (far from perfect, really), but it's a solution for multiple of my problems.

I don't know much about the solution you promote, but as usual with many "PGP killers" it replaces one very specific application of PGP and ignores all the others. Which is ok! Doing one thing and doing it well is the Unix philosophy after all. But it's not something I have use for, and it's not a viable replacement for GPG.

[+] LtWorf|2 years ago|reply

I'll let my boss know we must stop signing our releases and having our software automatically check if the new version is legit then.

We will instead switch to use some thing with a fluffy corporate website that tells absolutely nothing.

[+] 7to2|2 years ago|reply

Trust on first use is absolutely a valid use of PGP signatures that is being used in many real world systems (ask me how I know). You finding that PGP isn't being used they way you think it should does not justify removing it without providing a replacement.

Why on earth wasn't the community asked before you implemented this change?

> Given all of this, the continued support of uploading PGP signatures to PyPI is no longer defensible. While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.

This uninformed reasoning is what's indefensible.

[+] forgotmypw17|2 years ago|reply

What an amazing opportunity for someone to add a new way of integrating PGP authentication by writing two short scripts:

One to compile a list of file hashes and PGP-sign them.

One to validate these hashes against the provided signatures.

187 comments