> This is because app stores do a lot of heavy lifting to provide security for the app ecosystem. Specifically, they provide integrity, ensuring that apps being delivered are not tampered with, consistency, ensuring all users get the same app, and transparency, ensuring that the record of versions of an app is truthful and publicly visible.
The Google Play Store does none of this, lol. All apps created since 2021 have to make use of Google Play App Signing, which means Google holds the keys used to sign the app. They leverage this to include stuff like their Play Integrity in the builds that are served. The Android App Bundle format means that completely different versions of the app are delivered depending on the type of device, locale, etc. There is 0 transparency about this for the end-user.
As a further addition, Google does this for show (it’s not their business model) and is not equipped to deal with the criminals at Meta, as recently became apparent from, among other things, this disclosure:
https://archive.is/nWpDZhttps://localmess.github.io
This is really cool, and I'm excited to hear that it's making progress.
Binary transparency allows you to reason about the auditability of the JavaScript being delivered to your web browser. This is the first significant step towards a solution to the "JavaScript Cryptography Considered Harmful" blog post.
The remaining missing pieces here are, in my view, code signing and the corresponding notion of public key transparency.
It would be helpful if they included a problem statement of some sort.
I don't know what problem this solves.
While I could possibly read all this and deduce what it's for, I probably won't... (the stated premise of this, "It is as true today as it was in 2011 that Javascript cryptography is Considered Harmful." is not true.)
For me, the key problem being solved here is to have reasonably trustworthy web implementations of end-to-end-encrypted (E2EE) messaging.
The classic problem with E2EE messaging on the web is that the point of E2EE is that you don't have to trust the server not to read your messages, but if you're using a web client you have to trust the server to serve you JS that won't just send the plain text of your messages to the admin.
The properties of the web really exacerbate this problem, as you can serve every visitor to your site a different version of the app based on their IP, geolocation, tracking cookies, whatever. (Whereas with a mobile app everyone gets the same version you submitted to the app store).
With this proposed system, we could actually have really trustworthy E2EE messaging apps on the web, which would be huge.
(BTW, I do think E2EE web apps still have their place currently, if you trust the server to not be malicious (say, you or a trusted friend runs it), and you're protecting from accidental disclosure)
This allows you to validate that "what you sent is what they got", meaning that the code and assets the user's browser executes are exactly what you intended to publish.
So, this gives web apps and PWAs some of the same guarantees of native app stores, making them more trustworthy for security-sensitive use cases.
Ok (let's pretend I didn't see the word "blockchain" there), but none of this should interfere with browser extensions that need to modify the application code.
EDIT: Disregard this comment. I think there was a technical issue on my computer. Keeping the original comment below.
-----
> let's pretend I didn't see the word "blockchain" there
There's nothing blockchain about this blog post.
I think this might be a rectangles vs squares thing. While it's true that all blockchains use chains of hashes (e.g., via Merkle trees), it's not true that all uses of append-only data structures are cryptocurrency.
Much of what is described in the article can be accomplished with content addressable storage.
If we develop an internet where links describe the integrity of a file then we don't have to worry about the content changing out from underneath us. Additionally we get the benefit of being able to distribute the files we depend on anywhere.
Why make a map of hashes that correspond to human readable file urls, when we can directly link to hashes?
Yes, if every single URL in your web application has a hash in it (including <a> hrefs) then you don’t have to worry about anyone maliciously serving a webpage anymore.
But how do you get new app versions? I argue, if you want any meaningful security guarantees, an answer to this question will require transparency and/or code signing (which itself requires transparency, per my comment below)
I am a big fan of code verification. But from what I read, the process suggested is complicated. We already have the integrity flag to protect script. What would be needed to make it watertight is to change the hashing to include another token, maybe delivered via nameservice, so that the browser can verify it. Case closed. Minor change in the browser, complete protection against manipulation.
Starts reading: "fantastic, this is what we've been needing! But... where is code signing?"
> One problem that WAICT doesn’t solve is that of provenance: where did the code the user is running come from, precisely?
> ...
> The folks at the Freedom of Press Foundation (FPF) have built a solution to this, called WEBCAT. ... Users with the WEBCAT plugin can...
A plugin. Sigh.
Fancy, deep transparency logs that track every asset bundle deployed are good. I like logging - this is very cool. But this is not the first thing we need.
The first thing we need, is to be able to host a public signing key somewhere that browsers can get and automatically signature verify the root hash served up in that integrity manifest. Then point a tiny boring transparency log at _that_. That's the thing I really, really care about for non-equivocation. That's the piece that lets me host my site on Cloudflare pages (or Vercel, or Fly.io, or Joe's Quick and Dirty Hosting) that ensures the software being run in my client's browser is the software I signed.
This is the pivotal thing. It needs to live in the browser. We can't leave this to a plugin.
I'll actually argue the opposite. Transparency is _the_ pivotal thing, and code signing needs to be built on top of it (it definitely should be built into the browser, but I'm just arguing the order of operations rn).
TL;DR you'll either re-invent transparency or end up with huge security holes.
Suppose you have code signing and no transparency. Your site has some way of signaling to the browser to check code signatures under a certain pubkey (or OIDC identity if you're using Sigstore). Suppose now that your site is compromised. What is to prevent an attacker from changing the pubkey and re-signing under the new pubkey. Or just removing the pubkey entirely and signaling no code signing at all?
There are a three answers off the top of my head. Lmk if there's one I missed:
1. Websites enroll into a code signing preload list that the browser periodically pulls. Sites in the list are expected to serve valid signatures with respect to the pubkeys in the preload list.
Problem: how do sites unenroll? They can ask to be removed from the preload list. But in the meantime, their site is unusable. So there needs to be a tombstone value recorded somewhere to show that it's been unenrolled. That place it's recorded needs to be publicly auditable, otherwise an attacker will just make a tombstone value and then remove it.
So we've reinvented transparency.
2. User browsers remember which sites have code signing after first access.
Problem: This TOFU method offers no guarantees to first-time users. Also, it has the same unenrollment problem as above, so you'd still have to reinvent transparency.
3. Users visually inspect the public key every time they visit the site to make sure it is the one they expect.
Problem: This is famously a usability issue in e2ee apps like Signal and WhatsApp. Users have a noticeable error rate when comparing just one line of a safety number [1; Table 5]. To make any security claim, you'd have to argue that users would be motivated to do this check and get it right for the safety numbers for every security-sensitive site they access, over a long period of time. This just doesn't seem plausible
As a site owner, the best thing you can do for your users is to serve all your resources from a server you control. Serving javascript (or any resource) from a CDN was never a great idea and is pointless these days with browser domain isolation, you might as well just copy any third party .js in your build process.
I wrote a coincidently related rant post last week that didn't set the front page of HN on fire so I won't bother linking to it but the TL/DR is that a whole range of supply chain attacks just go away if you host the files yourself. Each third party you force your users to request from is an attack vector you don't control.
I get what this proposal is trying to achieve but it seems over complex. I would hate to have to integrate this into my build process.
You're right that, when your own server is trustworthy, fully self-hosting removes the need for SRI and integrity manifests. But in the case that your server is compromised, you lose all guarantees.
Transparency adds a mechanism to detect when your server has been compromised. Basically you just run a monitor on your own device occasionally (or use a third party service if you like), and you get an email notif whenever the site's manifest changes.
I agree it's far more work than just not doing transparency. But the guarantees are real and not something you get from any existing technology afaict.
What about something like this: A javascript bookmarklet that fetches the html page with SRI, then turn the file into a blobURL and run. Within the html page, all resources are chained with SRI.
1) It seems strange that this spec isn't an extension of the previous cache manifest mechanism, which was very similar and served a very similar purpose: it listed all of the URLs of your web app, so they could all be pre-downloaded... it just didn't include the hashes, and that could easily have been added.
2) That the hashes are the primary key and the path is the value makes no sense, as it means that files can only have exactly one path. I have often ended up with the same file mapped to two places in my website for various reasons, such as collisions in purpose over time (but the URL is a primary key) or degenerate objects. Now, yes: I can navigate avoiding that, but why do I have to? The only thing this seems to be buying is the idea that the same path can have more than one hash, and even if we really want that, it seems like it would make a million times more sense to just make the value be an array of hashes, as that will make this file a billion times more auditable: "what hashes can this path have?" should be more clear than "I did a search of the file to check and I realized we had a typo with the same path in two places". No one -- including the browser implementing this -- is trying to do the inverse operation (map a hash to a path).
3) That this signs only the content of the file and not the HTTP status or any of the headers seems like an inexcusable omission and is going to end up resulting in some kind of security flaw some day (which isn't an issue for subresource integrity, as those cases don't have headers the app might want and only comes into play for successful status).
We even have another specification in play for how and what to sign (which includes the ability to lock in only a subset of the headers): Signed HTTP Messages. That should be consulted and re-used somehow.
4) Since they want to be able to allow the site to be hosted in more than one place anyway, they really should bite the bullet and make the identity of the site be a key, not a hostname, and the origin of a site should then become the public key. This would let the same site hosted by multiple places share the same local browser storage and just act like the exact same site, and it would also immediately fix all of the problems with "what if someone hacks into my server and just unenrolls me from the thing", as if they do that they wouldn't have the signing key (which you can keep very very offline) and, when a user hits reload, the new site they see would be considered unrelated to the one they were previously on. You also get provenance for free, and no longer have to worry about how to deal with unenrollment: the site just stops serving a manifest and it is immediately back to being the normal boring website, and can't access any of the content the user gave to the trusted key origin.
1. I didn't know about this [1] actually! It looks like it's been unsupported for a few years now. The format looks pretty barebones, and we'd still need hashes like you said, as well as "wildcard" entries. I reckon the JSON solution might still be the better choice, but this is good to have as a reference.
2. I agree, and this is something we have gone back and forth on. The nice thing about hashes as primary keys is you can easily represent a single path having many possible values, and you can represent "occurs anywhere" hashes by giving them the empty string. But the downside like you mention is that a hash cannot occur at multiple paths, which is far from ideal. I'll make an issue in the Github about this, because I don't think it's near settled.
3. I had read the spec [2] but never made this connection! You're right that it's not hard to imagine malleability sneaking in via headers and status codes. I'll make an issue for this.
4. I wanted to veer a bit from requiring sites to hold yet more cryptographic material than they already do. Yes you can keep signing keys "very very offline", but this requires a level of practice that I'm not sure most people would achieve. Also you run into key rotation annoyances as well. The current route to something like you describe is have every site have their own transparency log entry (though they can share manifests and even asset hosts), and use code signing to link their instance to the source of truth.
Yet another horrible corporate solution for inexistent problems, backed by pedantic ramblings of confident juniors leading negligent seniors, as it always was. Surprised? Never.
vader1|4 months ago
The Google Play Store does none of this, lol. All apps created since 2021 have to make use of Google Play App Signing, which means Google holds the keys used to sign the app. They leverage this to include stuff like their Play Integrity in the builds that are served. The Android App Bundle format means that completely different versions of the app are delivered depending on the type of device, locale, etc. There is 0 transparency about this for the end-user.
R_Spaghetti|4 months ago
some_furry|4 months ago
Binary transparency allows you to reason about the auditability of the JavaScript being delivered to your web browser. This is the first significant step towards a solution to the "JavaScript Cryptography Considered Harmful" blog post.
The remaining missing pieces here are, in my view, code signing and the corresponding notion of public key transparency.
jmull|4 months ago
I don't know what problem this solves.
While I could possibly read all this and deduce what it's for, I probably won't... (the stated premise of this, "It is as true today as it was in 2011 that Javascript cryptography is Considered Harmful." is not true.)
miloignis|4 months ago
The classic problem with E2EE messaging on the web is that the point of E2EE is that you don't have to trust the server not to read your messages, but if you're using a web client you have to trust the server to serve you JS that won't just send the plain text of your messages to the admin.
The properties of the web really exacerbate this problem, as you can serve every visitor to your site a different version of the app based on their IP, geolocation, tracking cookies, whatever. (Whereas with a mobile app everyone gets the same version you submitted to the app store).
With this proposed system, we could actually have really trustworthy E2EE messaging apps on the web, which would be huge.
(BTW, I do think E2EE web apps still have their place currently, if you trust the server to not be malicious (say, you or a trusted friend runs it), and you're protecting from accidental disclosure)
CharlesW|4 months ago
This allows you to validate that "what you sent is what they got", meaning that the code and assets the user's browser executes are exactly what you intended to publish.
So, this gives web apps and PWAs some of the same guarantees of native app stores, making them more trustworthy for security-sensitive use cases.
andrewmcwatters|4 months ago
[deleted]
zb3|4 months ago
some_furry|4 months ago
-----
> let's pretend I didn't see the word "blockchain" there
There's nothing blockchain about this blog post.
I think this might be a rectangles vs squares thing. While it's true that all blockchains use chains of hashes (e.g., via Merkle trees), it's not true that all uses of append-only data structures are cryptocurrency.
See also: Certificate transparency.
evbogue|4 months ago
If we develop an internet where links describe the integrity of a file then we don't have to worry about the content changing out from underneath us. Additionally we get the benefit of being able to distribute the files we depend on anywhere.
Why make a map of hashes that correspond to human readable file urls, when we can directly link to hashes?
doomrobo|4 months ago
But how do you get new app versions? I argue, if you want any meaningful security guarantees, an answer to this question will require transparency and/or code signing (which itself requires transparency, per my comment below)
unknown|4 months ago
[deleted]
mimerz|4 months ago
thadt|4 months ago
> One problem that WAICT doesn’t solve is that of provenance: where did the code the user is running come from, precisely?
> ...
> The folks at the Freedom of Press Foundation (FPF) have built a solution to this, called WEBCAT. ... Users with the WEBCAT plugin can...
A plugin. Sigh.
Fancy, deep transparency logs that track every asset bundle deployed are good. I like logging - this is very cool. But this is not the first thing we need.
The first thing we need, is to be able to host a public signing key somewhere that browsers can get and automatically signature verify the root hash served up in that integrity manifest. Then point a tiny boring transparency log at _that_. That's the thing I really, really care about for non-equivocation. That's the piece that lets me host my site on Cloudflare pages (or Vercel, or Fly.io, or Joe's Quick and Dirty Hosting) that ensures the software being run in my client's browser is the software I signed.
This is the pivotal thing. It needs to live in the browser. We can't leave this to a plugin.
doomrobo|4 months ago
TL;DR you'll either re-invent transparency or end up with huge security holes.
Suppose you have code signing and no transparency. Your site has some way of signaling to the browser to check code signatures under a certain pubkey (or OIDC identity if you're using Sigstore). Suppose now that your site is compromised. What is to prevent an attacker from changing the pubkey and re-signing under the new pubkey. Or just removing the pubkey entirely and signaling no code signing at all?
There are a three answers off the top of my head. Lmk if there's one I missed:
1. Websites enroll into a code signing preload list that the browser periodically pulls. Sites in the list are expected to serve valid signatures with respect to the pubkeys in the preload list.
Problem: how do sites unenroll? They can ask to be removed from the preload list. But in the meantime, their site is unusable. So there needs to be a tombstone value recorded somewhere to show that it's been unenrolled. That place it's recorded needs to be publicly auditable, otherwise an attacker will just make a tombstone value and then remove it.
So we've reinvented transparency.
2. User browsers remember which sites have code signing after first access.
Problem: This TOFU method offers no guarantees to first-time users. Also, it has the same unenrollment problem as above, so you'd still have to reinvent transparency.
3. Users visually inspect the public key every time they visit the site to make sure it is the one they expect.
Problem: This is famously a usability issue in e2ee apps like Signal and WhatsApp. Users have a noticeable error rate when comparing just one line of a safety number [1; Table 5]. To make any security claim, you'd have to argue that users would be motivated to do this check and get it right for the safety numbers for every security-sensitive site they access, over a long period of time. This just doesn't seem plausible
[1] https://arxiv.org/abs/2306.04574
AndrewStephens|4 months ago
I wrote a coincidently related rant post last week that didn't set the front page of HN on fire so I won't bother linking to it but the TL/DR is that a whole range of supply chain attacks just go away if you host the files yourself. Each third party you force your users to request from is an attack vector you don't control.
I get what this proposal is trying to achieve but it seems over complex. I would hate to have to integrate this into my build process.
doomrobo|4 months ago
Transparency adds a mechanism to detect when your server has been compromised. Basically you just run a monitor on your own device occasionally (or use a third party service if you like), and you get an email notif whenever the site's manifest changes.
I agree it's far more work than just not doing transparency. But the guarantees are real and not something you get from any existing technology afaict.
unknown|4 months ago
[deleted]
woranl|4 months ago
saurik|4 months ago
2) That the hashes are the primary key and the path is the value makes no sense, as it means that files can only have exactly one path. I have often ended up with the same file mapped to two places in my website for various reasons, such as collisions in purpose over time (but the URL is a primary key) or degenerate objects. Now, yes: I can navigate avoiding that, but why do I have to? The only thing this seems to be buying is the idea that the same path can have more than one hash, and even if we really want that, it seems like it would make a million times more sense to just make the value be an array of hashes, as that will make this file a billion times more auditable: "what hashes can this path have?" should be more clear than "I did a search of the file to check and I realized we had a typo with the same path in two places". No one -- including the browser implementing this -- is trying to do the inverse operation (map a hash to a path).
3) That this signs only the content of the file and not the HTTP status or any of the headers seems like an inexcusable omission and is going to end up resulting in some kind of security flaw some day (which isn't an issue for subresource integrity, as those cases don't have headers the app might want and only comes into play for successful status). We even have another specification in play for how and what to sign (which includes the ability to lock in only a subset of the headers): Signed HTTP Messages. That should be consulted and re-used somehow.
4) Since they want to be able to allow the site to be hosted in more than one place anyway, they really should bite the bullet and make the identity of the site be a key, not a hostname, and the origin of a site should then become the public key. This would let the same site hosted by multiple places share the same local browser storage and just act like the exact same site, and it would also immediately fix all of the problems with "what if someone hacks into my server and just unenrolls me from the thing", as if they do that they wouldn't have the signing key (which you can keep very very offline) and, when a user hits reload, the new site they see would be considered unrelated to the one they were previously on. You also get provenance for free, and no longer have to worry about how to deal with unenrollment: the site just stops serving a manifest and it is immediately back to being the normal boring website, and can't access any of the content the user gave to the trusted key origin.
doomrobo|4 months ago
2. I agree, and this is something we have gone back and forth on. The nice thing about hashes as primary keys is you can easily represent a single path having many possible values, and you can represent "occurs anywhere" hashes by giving them the empty string. But the downside like you mention is that a hash cannot occur at multiple paths, which is far from ideal. I'll make an issue in the Github about this, because I don't think it's near settled.
3. I had read the spec [2] but never made this connection! You're right that it's not hard to imagine malleability sneaking in via headers and status codes. I'll make an issue for this.
4. I wanted to veer a bit from requiring sites to hold yet more cryptographic material than they already do. Yes you can keep signing keys "very very offline", but this requires a level of practice that I'm not sure most people would achieve. Also you run into key rotation annoyances as well. The current route to something like you describe is have every site have their own transparency log entry (though they can share manifests and even asset hosts), and use code signing to link their instance to the source of truth.
[1] https://en.wikipedia.org/wiki/Cache_manifest_in_HTML5
[2] https://www.rfc-editor.org/rfc/rfc9421.html
everdrive|4 months ago
unknown|4 months ago
[deleted]
jazzcript|4 months ago