Are We PEP740 Yet?

[+] simonw|1 year ago|reply

I suggest reading this detailed article to understand why they built this: https://blog.trailofbits.com/2024/11/14/attestations-a-new-g...

The implementation is interesting - it's a static page built using GitHub Actions, and the key part of the implementation is this Python function here: https://github.com/trailofbits/are-we-pep740-yet/blob/a87a88...

If you read the code you can see that it's hitting pages like https://pypi.org/simple/pydantic/ - which return HTML - but sending this header instead:

    Accept: application/vnd.pypi.simple.v1+json

Then scanning through the resulting JSON looking for files that have a provenance that isn't set to null.

Here's an equivalent curl + jq incantation:

    curl -s \
      -H 'Accept: application/vnd.pypi.simple.v1+json' \
      https://pypi.org/simple/pydantic/ \
    | jq '.files | map(select(.provenance != null)) | length'

[+] Cthulhu_|1 year ago|reply

That's the first time I've seen JSON api standard headers in the wild. There was a project where an architect indicated our APIs should be built in that fashion, but people just... disregarded it completely out of pragmatism, also because our endpoints were just pure API / JSON endpoints, never anything else. But seeing how it's used in the wild is pretty clever, same endpoint for different use cases.

[+] cyrnel|1 year ago|reply

Why invest so much time and money in a feature that prevents such a small percentage of data breaches that it's not even categorized on the 2024 Verizon Data Breach Investigations Report?

The vast majority of breaches are caused by credential theft, phishing, and exploiting vulnerabilities.

It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.

[+] darkamaul|1 year ago|reply

The fact that a security measure doesn't solve all or even most breaches doesn't mean it's not worth implementing. Supply chain attacks may be a smaller percentage of breaches, but they can have massive impact when they do occur (see SolarWinds). Security is all about layers - each measure raises the bar incrementally.

[+] some_furry|1 year ago|reply

Because Verizon's report, while a good read, isn't the end-all-be-all of threat intelligence.

https://www.wired.com/story/notpetya-cyberattack-ukraine-rus...

https://krebsonsecurity.com/2020/12/u-s-treasury-commerce-de...

Software supply chain attacks are rare, but when they happen, they're usually high-impact ordeals.

[+] tptacek|1 year ago|reply

The 2024 DBIR, for whatever it's worth, repeatedly mentions software supply chain attacks.

[+] twothreeone|1 year ago|reply

Probably because they got a government contract under which they receive funding for 3-5 FTEs over 24-36 months in return for quarterly reports - and a tool like this makes the DARPA PM happy. They're one of those "Cyber Defense Contractors"..

[+] Cthulhu_|1 year ago|reply

Why not? You're presenting a false dichotomy, the time spent on this security does not take away time spent on the other ones you mentioned, and ultimately all security measures should be taken.

[+] itsgrimetime|1 year ago|reply

> Why invest so much time and money in a feature that prevents such a small percentage of data breaches ...

Because it's a tractable problem that these devs can solve - and just because they're working on this doesn't meant they (or others) aren't also working on the other things.

> It doesn't matter that you can cryptographically verify that a package came from a given commit if ...

Sure, but just because it doesn't solve every single problem doesn't mean it's not worthwhile

[+] gklitz|1 year ago|reply

They already went through requirering 2FA for the most popular packages: https://blog.pypi.org/posts/2023-05-25-securing-pypi-with-2f...

This is just another step in increasing security. And of cause that is something you want to preferably do prior to breaches not only as a reaction.

[+] tzlander|1 year ago|reply

This method favors big corporations and provides further lock-in. Python only does what Microsoft/Instagram etc. demand.

So you get suit-compatible catch phrases like "SBOM" (notice how free software has been deliberately degraded to "materials" in that acronym!).

The corporations want to control open source, milk it, feed it to their LLMs, plagiarize it and so forth. And they pay enough "open" source developers who sell out the valuable parts that are usually written by other people.

As you say, it's partly security theater because of the other attack vectors that are especially relevant in an organization that has no stringent procedures, no open discussion culture or commitment to correctness like e.g. FreeBSD.

[+] rfoo|1 year ago|reply

> It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.

If that commit has accidentally-vulnerable code, or someone just gets phished and attacker added some malicious code to the repository with his creds, it is visible.

However, if the supply chain was secretly compromised and the VCS repo was always clean, only the release contains malware, then good luck finding it out.

We've all witnessed this earlier this year, in the xz accident, while the (encrypted) malicious code was presented in the source code repo as part of test data, the code to load and decrypt it only ever existed in release tarballs.

[+] Tknl|1 year ago|reply

https://slsa.dev/ gives much clearer explanations about the why of this work. Github recently started offering a SaaS sigstore implementation including support for private reps. https://docs.github.com/en/actions/security-for-github-actio... Anyone working on OT should be quickly moving towards this.

[+] marky1991|1 year ago|reply

Could someone explain why this is important? My uninformed feeling towards PEP 740 is 'who cares?'.

[+] darkamaul|1 year ago|reply

Supply chain attacks can exploit gaps between source code and distributed packages. Today, if PyPI were to be compromised, attackers could serve malicious packages even if the source code is clean.

Attestations provide cryptographic proof linking published packages to specific code states. This proof can be verified independently of PyPI - reducing exclusive trust in the package index.

Worth noting, attestations aren't a complete defense against index compromises since an attacker could simply stop serving attestations (though this would raise alerts for users monitoring their dependencies' attestation status).

Is this a silver bullet? No. If an attacker compromises a project's source repository, attestations won't help. However, it meaningfully reduces certain attack vectors and moves us towards being able to cryptographically verify the entire chain from source code to deployed package.

(Disclaimer: I helped build this feature for PyPI)

[+] hadlock|1 year ago|reply

I believe this is a system where a human/system builds a package and uploads and cryptographically signs it, verifying end to end that the code uploaded to github for widget-package 3.2.1 is the code you're downloading to your laptop for widget-package 3.2.1 and there's no chance it is modified/signed by a adversarial third party

[+] unknown|1 year ago|reply

[deleted]

[+] rty32|1 year ago|reply

https://en.m.wikipedia.org/wiki/XZ_Utils_backdoor

[+] progval|1 year ago|reply

According to this page, urllib3 does not use trusted publishing. According to https://docs.pypi.org/project_metadata/#verified-details , trusted publishing and self-links are the only ways to have "verified details". However https://pypi.org/project/urllib3/ shows Changelog/Code/Issue tracker as "Verified details" even though they are not self-links. How come?

urllib3 does not have a recent release that could explain https://trailofbits.github.io/are-we-pep740-yet/ lagging behind.

[+] darkamaul|1 year ago|reply

This page only shows if a package has been uploaded with attestations .The verified details (Changelog/Code/Issue tracker) are showing because they do use Trusted Publishing.

However, they have not published a new version since the beginning of attestation support in PyPI. That's the meaning of the clock icon right to the package name.

Their workflow responsible for publishing new releases [1] has support for attestations. Thus, it will turn green on this page with the next project release.

[1] https://github.com/urllib3/urllib3/blob/main/.github/workflo...

[+] physicsguy|1 year ago|reply

People don’t have to use GitHub, and certainly don’t have to use GitHub Actions even if they do

[+] globular-toast|1 year ago|reply

"How can I trust you?"

"I am trusted."

It's basically the same model as HTTPS. Not sure if it has a name. "Too big to fail" security? Security by fiat?

[+] Arch-TK|1 year ago|reply

Something this doesn't answer:

Can I make my package green without having to compromise my integrity by utilising proprietary git hosting?

[+] blenderob|1 year ago|reply

I've got the same question. Anyone knows the answer please?

[+] darthwalsh|1 year ago|reply

I read that GitLab was going to be supported too.

[+] zahlman|1 year ago|reply

>Using a Trusted Publisher is the easiest way to enable attestations, since they come baked in! See the PyPI user docs and official PyPA publishing action to get started.

For many smaller packages in this top 360 list I could imagine this representing quite a bit of a learning curve.

[+] amiga386|1 year ago|reply

Or it could see Microsoft tightening its proprietary grip over free software by not only generously offering gratis hosting, but now also it's a Trusted Publisher and you're not - why read those tricky docs? Move all your hosting to Microsoft today, make yourself completely dependent on it, and you'll be rewarded with a green tick!

[+] simonw|1 year ago|reply

I think it's pretty hard to get a Python package into the top 360 list while not picking up any maintainers who could climb that learning curve pretty quickly. I wrote my own notes on how to use Trusted Publishers here: https://til.simonwillison.net/pypi/pypi-releases-from-github

The bigger problem is for projects that aren't hosting on GitHub and using GitHub Actions - I'm sure there are quite a few of those in the top 360.

I expect that implementing attestations without using the PyPA GitHub Actions script has a much steeper learning curve, at least for the moment.

[+] woodruffw|1 year ago|reply

I suspect that most of the packages in the top 360 list are already hosted on GitHub, so this shouldn’t be a leap for many of them. This is one of the reasons we saw Trusted Publishing adopted relatively quickly: it required less work and was trivial to adopt within existing CI workflows.

[+] unknown|1 year ago|reply

[deleted]

79 comments