WebBundles are built for content-addressable networks

[+] jefftk|5 years ago|reply

The Brave post was discussed here extensively a few days ago, with several people (including me) pointing out how the author misunderstands what can be done today and what bundles make easier: https://news.ycombinator.com/item?id=24274968

Afterwards I wrote up a response, explaining how bundles don't facilitate adblocker circumvention: https://www.jefftk.com/p/webbundles-and-url-randomization

(Disclosure: I work on ads at Google)

[+] gorhill|5 years ago|reply

> Everyone who visits the site will need essentially all of these files.

Not necessarily.

For instance, I wholly block JavaScript and as a result I could avoid downloading ~283 KB of JavaScript resources from your page (out of ~412 KB resources in total). I could read your page fine.

If I understand correctly, with a WebBundle there would be no way for me to avoid downloading resources which are explicitly meant to not be downloaded by the way I configured my user agent?

[+] csande17|5 years ago|reply

There is a very simple way in which WebBundles will be used to bypass ad blockers:

The tool to create WebBundles will, in all likelihood, be created and maintained by Google. Google will program this tool to detect script tags with a src of ads.google.com/js/adscript.js and replace them with a local copy of the ad script embedded in the bundle. (They may do this by calling the feature something like "embed common third party resources" and also use it for files like jQuery and Google Fonts.) Then, adblockers will be unable to block the ad script, because it appears with a different path in each bundle.

In other words, you're right that randomizing URLs to evade ad-blockers requires server-side coordination. Conveniently, though, adopting WebBundles also requires server-side coordination, with the same company that stands to benefit from ad-blocker evasion.

[+] marijn|5 years ago|reply

> Brave concerns about WebBundles are legit in a location-based addressing Internet, but all of them would immediately be removed the moment we switch from a location-based addressing to a content-based addressing approach for the Internet.

I failed to find any good case being made for why content-addressable content would be any less likely to try to perform malicious actions than URL-addressed content. Is this just utopian wishful thinking or did I miss something?

[+] spankalee|5 years ago|reply

Brave's concerns aren't legit though. WebBundles don't change the request/response system or origin model of the web. They really don't change URLs or blocker abilities at all. Brave is ascribing them either powers they don't have, or that you can already do with plain servers.

[+] rudolph9|5 years ago|reply

The address of the content is a hash of the content. It’s trivial for even low power devices to verify the content they revived matches the address they requested.

[+] hinkley|5 years ago|reply

We are moving by increments toward not letting content on a page send information directly to a separate origin.

With content addressable networks, it would be a challenge to enforce this, which implies rolling back security improvements, which means security regression.

For interactive content, at least part of the page has to have an origin. Maybe only the root document get an origin, and the rest gets none or the same?

But then what happens with domain expiry?

It may mean that interactive documents require a web server, even if the bulk of the page, or even a document tree, is stitched together from addressable content.

[+] sktrdie|5 years ago|reply

I think it’s because with content addressable URLs, the URL is a hash and you can verify that the content never changes? But not 100% sure

[+] outsomnia|5 years ago|reply

I don't really get what the advantage is for making an big atomic blob as the resource vs independently updateable pieces as h2 streams / etags / client cacheable.

It's just PUSH gone crazy?

[+] spankalee|5 years ago|reply

A huge fraction of sites today are already bundled, just poorly. WebBundles solves the problem that devs are already working around with tools.

HTML, JavaScript, CSS and images are not natively bundleable. So tools like Webpack and Rollup dramatically transform the files to be able to bundle them. JavaScript is recompiled into a single file so that it's impossible to cache the individual files, and many features like dynamic import, asset fetches, and import.meta.url are partially broken.

CSS is concatenated, against breaking caching and preventing the use of individual files in separate CSS scopes.

WebBundles actually bundles the real files that would have been sent unbundled, and it works with any file/response type. WebBundles have much better compressibility than the unbundled files. So it's is both a better HTTP/2 push, and a better version of what sites are doing with tools.

And since WebBundles work with the existing request/response pipeline, the files are individually fed into the network cache (and blockers, btw), which makes it possible to build delta bundles that only include updated files. This gives you the best of both worlds of bundled and unbundled serving.

[+] aclelland|5 years ago|reply

I think one of the reason larger companies like them is because they may end ad blockers once and for all.

You can read a discussion about some of the issues that it causes here - https://github.com/WICG/webpackage/issues/551 Of course, the Brave browser also have some concerns about it - https://brave.com/webbundles-harmful-to-content-blocking-sec...

[+] untog|5 years ago|reply

I can see the advantages, for example I could send you a page as an email attachment without any problems. There’s clearly a use for it because browsers let you do (more or less) this with their own custom formats.

[+] pcwalton|5 years ago|reply

WebBundles exist so that AMP content can look like it came from the original site instead.

(I understand that the technology is technically neutral, but AMP is the practical reason why Google is pushing WebBundles.)

[+] whoopdedo|5 years ago|reply

Developers were just nostalgic for SWF, I guess.

[+] dmitriid|5 years ago|reply

You have to look at where the proposal comes from. This and Signed Exchanges are Google's rebranding of AMP as "open standards".

Signed Exchanges are considered harmful by Mozilla [1]

There's concern that Web Bundles are just Google's power play to be the sole gateway to the web [2]

[1] https://mozilla.github.io/standards-positions/

[2] https://twitter.com/Rich_Harris/status/1299015418913460226

[+] thorum|5 years ago|reply

WebBundles seem like they would be a useful format for distributing PWAs - like a web standard alternative to Android's APK files.

[+] dsun179|5 years ago|reply

Isnt that already possible with mhtml?

https://en.m.wikipedia.org/wiki/MHTML

[+] kinlan|5 years ago|reply

Mhtml tends to not have JavaScript run with it because there is no origin attached to it. That's one of the benefits of web bundles, they can run with an origin attached so they have access to the correct storage and other sandboxing primitives.

[+] cordite|5 years ago|reply

The repo [1] appears silent for a few months. Is this being actively developed inside the google org, or is this just an experiment left abandon?

[1]: https://github.com/google/webbundle

[+] skybrian|5 years ago|reply

I'm wondering if IPFS or other content-addressable networks handle version updates for documents as well as git handles code?

It might be nice to make websites that are more like PDF's that can be redistributed, downloaded, and stored. But when there are many versions of immutable content, the result is a mess, with people having random versions distributed all over the place. Having built-in history and being able to sync to HEAD would make this a lot easier.

[+] matt_kantor|5 years ago|reply

I believe mutable pointers like that are outside the scope of IPFS itself, instead within the realm of name systems like IPNS[1] and DNSLink[2]. I'm not sure if/how those systems track history.

Unsurprisingly, some people want to use blockchains[3][4] (those definitely have history).

[1]: https://docs.ipfs.io/concepts/ipns

[2]: https://dnslink.io

[3]: https://www.namecoin.org

[4]: https://ens.domains

[+] bonfire|5 years ago|reply

So basically we can give up HTTPS for origin authentication (leave it for encryption/privacy) assuming the bundles are all signed?

[+] Ericson2314|5 years ago|reply

While I get the argument that things vaguely like AMP could be used with IPFS, these WebBundles in particular catting everything together would seem to undermine IPFS's ability to dedup. No thanks.

[+] gumby|5 years ago|reply

WebBundles are the CDROMs of 2020 and solve nothing for the end user. It’s all about those ads.

[+] spankalee|5 years ago|reply

Are you claiming that everyone should stop using Webpack and Rollup too?

[+] jrururufuf666|5 years ago|reply

the continued trend of overcomplicating things with 100 frameworks. i miss ftp'ing my html site to geocities

55 comments