The Brave post was discussed here extensively a few days ago, with several people (including me) pointing out how the author misunderstands what can be done today and what bundles make easier: https://news.ycombinator.com/item?id=24274968
> Everyone who visits the site will need essentially all of these files.
Not necessarily.
For instance, I wholly block JavaScript and as a result I could avoid downloading ~283 KB of JavaScript resources from your page (out of ~412 KB resources in total). I could read your page fine.
If I understand correctly, with a WebBundle there would be no way for me to avoid downloading resources which are explicitly meant to not be downloaded by the way I configured my user agent?
There is a very simple way in which WebBundles will be used to bypass ad blockers:
The tool to create WebBundles will, in all likelihood, be created and maintained by Google. Google will program this tool to detect script tags with a src of ads.google.com/js/adscript.js and replace them with a local copy of the ad script embedded in the bundle. (They may do this by calling the feature something like "embed common third party resources" and also use it for files like jQuery and Google Fonts.) Then, adblockers will be unable to block the ad script, because it appears with a different path in each bundle.
In other words, you're right that randomizing URLs to evade ad-blockers requires server-side coordination. Conveniently, though, adopting WebBundles also requires server-side coordination, with the same company that stands to benefit from ad-blocker evasion.
> Brave concerns about WebBundles are legit in a location-based addressing Internet, but all of them would immediately be removed the moment we switch from a location-based addressing to a content-based addressing approach for the Internet.
I failed to find any good case being made for why content-addressable content would be any less likely to try to perform malicious actions than URL-addressed content. Is this just utopian wishful thinking or did I miss something?
Brave's concerns aren't legit though. WebBundles don't change the request/response system or origin model of the web. They really don't change URLs or blocker abilities at all. Brave is ascribing them either powers they don't have, or that you can already do with plain servers.
The address of the content is a hash of the content. It’s trivial for even low power devices to verify the content they revived matches the address they requested.
We are moving by increments toward not letting content on a page send information directly to a separate origin.
With content addressable networks, it would be a challenge to enforce this, which implies rolling back security improvements, which means security regression.
For interactive content, at least part of the page has to have an origin. Maybe only the root document get an origin, and the rest gets none or the same?
But then what happens with domain expiry?
It may mean that interactive documents require a web server, even if the bulk of the page, or even a document tree, is stitched together from addressable content.
I don't really get what the advantage is for making an big atomic blob as the resource vs independently updateable pieces as h2 streams / etags / client cacheable.
A huge fraction of sites today are already bundled, just poorly. WebBundles solves the problem that devs are already working around with tools.
HTML, JavaScript, CSS and images are not natively bundleable. So tools like Webpack and Rollup dramatically transform the files to be able to bundle them. JavaScript is recompiled into a single file so that it's impossible to cache the individual files, and many features like dynamic import, asset fetches, and import.meta.url are partially broken.
CSS is concatenated, against breaking caching and preventing the use of individual files in separate CSS scopes.
WebBundles actually bundles the real files that would have been sent unbundled, and it works with any file/response type. WebBundles have much better compressibility than the unbundled files. So it's is both a better HTTP/2 push, and a better version of what sites are doing with tools.
And since WebBundles work with the existing request/response pipeline, the files are individually fed into the network cache (and blockers, btw), which makes it possible to build delta bundles that only include updated files. This gives you the best of both worlds of bundled and unbundled serving.
I can see the advantages, for example I could send you a page as an email attachment without any problems. There’s clearly a use for it because browsers let you do (more or less) this with their own custom formats.
Mhtml tends to not have JavaScript run with it because there is no origin attached to it. That's one of the benefits of web bundles, they can run with an origin attached so they have access to the correct storage and other sandboxing primitives.
I'm wondering if IPFS or other content-addressable networks handle version updates for documents as well as git handles code?
It might be nice to make websites that are more like PDF's that can be redistributed, downloaded, and stored. But when there are many versions of immutable content, the result is a mess, with people having random versions distributed all over the place. Having built-in history and being able to sync to HEAD would make this a lot easier.
I believe mutable pointers like that are outside the scope of IPFS itself, instead within the realm of name systems like IPNS[1] and DNSLink[2]. I'm not sure if/how those systems track history.
Unsurprisingly, some people want to use blockchains[3][4] (those definitely have history).
While I get the argument that things vaguely like AMP could be used with IPFS, these WebBundles in particular catting everything together would seem to undermine IPFS's ability to dedup. No thanks.
[+] [-] jefftk|5 years ago|reply
Afterwards I wrote up a response, explaining how bundles don't facilitate adblocker circumvention: https://www.jefftk.com/p/webbundles-and-url-randomization
(Disclosure: I work on ads at Google)
[+] [-] gorhill|5 years ago|reply
Not necessarily.
For instance, I wholly block JavaScript and as a result I could avoid downloading ~283 KB of JavaScript resources from your page (out of ~412 KB resources in total). I could read your page fine.
If I understand correctly, with a WebBundle there would be no way for me to avoid downloading resources which are explicitly meant to not be downloaded by the way I configured my user agent?
[+] [-] csande17|5 years ago|reply
The tool to create WebBundles will, in all likelihood, be created and maintained by Google. Google will program this tool to detect script tags with a src of ads.google.com/js/adscript.js and replace them with a local copy of the ad script embedded in the bundle. (They may do this by calling the feature something like "embed common third party resources" and also use it for files like jQuery and Google Fonts.) Then, adblockers will be unable to block the ad script, because it appears with a different path in each bundle.
In other words, you're right that randomizing URLs to evade ad-blockers requires server-side coordination. Conveniently, though, adopting WebBundles also requires server-side coordination, with the same company that stands to benefit from ad-blocker evasion.
[+] [-] marijn|5 years ago|reply
I failed to find any good case being made for why content-addressable content would be any less likely to try to perform malicious actions than URL-addressed content. Is this just utopian wishful thinking or did I miss something?
[+] [-] spankalee|5 years ago|reply
[+] [-] rudolph9|5 years ago|reply
[+] [-] hinkley|5 years ago|reply
With content addressable networks, it would be a challenge to enforce this, which implies rolling back security improvements, which means security regression.
For interactive content, at least part of the page has to have an origin. Maybe only the root document get an origin, and the rest gets none or the same?
But then what happens with domain expiry?
It may mean that interactive documents require a web server, even if the bulk of the page, or even a document tree, is stitched together from addressable content.
[+] [-] sktrdie|5 years ago|reply
[+] [-] outsomnia|5 years ago|reply
It's just PUSH gone crazy?
[+] [-] spankalee|5 years ago|reply
HTML, JavaScript, CSS and images are not natively bundleable. So tools like Webpack and Rollup dramatically transform the files to be able to bundle them. JavaScript is recompiled into a single file so that it's impossible to cache the individual files, and many features like dynamic import, asset fetches, and import.meta.url are partially broken.
CSS is concatenated, against breaking caching and preventing the use of individual files in separate CSS scopes.
WebBundles actually bundles the real files that would have been sent unbundled, and it works with any file/response type. WebBundles have much better compressibility than the unbundled files. So it's is both a better HTTP/2 push, and a better version of what sites are doing with tools.
And since WebBundles work with the existing request/response pipeline, the files are individually fed into the network cache (and blockers, btw), which makes it possible to build delta bundles that only include updated files. This gives you the best of both worlds of bundled and unbundled serving.
[+] [-] aclelland|5 years ago|reply
You can read a discussion about some of the issues that it causes here - https://github.com/WICG/webpackage/issues/551 Of course, the Brave browser also have some concerns about it - https://brave.com/webbundles-harmful-to-content-blocking-sec...
[+] [-] untog|5 years ago|reply
[+] [-] pcwalton|5 years ago|reply
(I understand that the technology is technically neutral, but AMP is the practical reason why Google is pushing WebBundles.)
[+] [-] whoopdedo|5 years ago|reply
[+] [-] dmitriid|5 years ago|reply
Signed Exchanges are considered harmful by Mozilla [1]
There's concern that Web Bundles are just Google's power play to be the sole gateway to the web [2]
[1] https://mozilla.github.io/standards-positions/
[2] https://twitter.com/Rich_Harris/status/1299015418913460226
[+] [-] thorum|5 years ago|reply
[+] [-] dsun179|5 years ago|reply
https://en.m.wikipedia.org/wiki/MHTML
[+] [-] kinlan|5 years ago|reply
[+] [-] cordite|5 years ago|reply
[1]: https://github.com/google/webbundle
[+] [-] skybrian|5 years ago|reply
It might be nice to make websites that are more like PDF's that can be redistributed, downloaded, and stored. But when there are many versions of immutable content, the result is a mess, with people having random versions distributed all over the place. Having built-in history and being able to sync to HEAD would make this a lot easier.
[+] [-] matt_kantor|5 years ago|reply
Unsurprisingly, some people want to use blockchains[3][4] (those definitely have history).
[1]: https://docs.ipfs.io/concepts/ipns
[2]: https://dnslink.io
[3]: https://www.namecoin.org
[4]: https://ens.domains
[+] [-] bonfire|5 years ago|reply
[+] [-] Ericson2314|5 years ago|reply
[+] [-] gumby|5 years ago|reply
[+] [-] spankalee|5 years ago|reply
[+] [-] jrururufuf666|5 years ago|reply