top | item 13208325

System loads web pages 34 percent faster by fetching files more effectively

177 points| qwename | 9 years ago |news.mit.edu

101 comments

tyingq|9 years ago

Interesting. The paper was released before HTTP/2 was in widespread use. They do show that their approach has significant improvements over SPDY alone...I wonder how the comparison to HTTP/2 alone would fare.

vitus|9 years ago

Personally, I'm more curious to see the comparison with QUIC, which eliminates the RTTs to set up TCP + TLS, and multiplexes connections in a way that doesn't lead to the head-of-line blocking problem that you can see with HTTP/2 over TCP -- QUIC uses UDP, so no requirement for in-order delivery across the entire connection, just within each resource flow (in this context, each request).

On that note, I do think it's a bit inaccurate to say that Google's efforts are/were primarily focused on data compression -- yes, they did introduce brotli, which is just a better LZ77 implementation (primary difference with gzip is that window size isn't a fixed 32KB), but they also pioneered SPDY (which turned into HTTP/2 after going through the standards committee) and now QUIC.

(obligatory disclaimer that google gives me money, so I am biased)

breischl|9 years ago

Aren't they orthogonal? No matter how fast HTTP/2 is or how much it decreases connection setup times, requesting resources in the "right" order will always be faster than doing it in one of the "wrong" orders.

More efficient protocols might reduce the disparity, but there should always be one. Right?

k_lander|9 years ago

is there significant difference in performance between HTTP/2 and SPDY?

qwename|9 years ago

Relevant paper: "Polaris: Faster Page Loads Using Fine-grained Dependency Tracking", http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf

naasking|9 years ago

Argh, don't people Google names before using them? Polaris was HP's virus-safe Windows project:

http://www.hpl.hp.com/techreports/2004/HPL-2004-221.html

leeoniya|9 years ago

but we can already make web pages load 500% faster by not shoveling a ton of shit, not loading scripts from 60 third-party domains (yes stop using CDNs for jQuery/js libs, those https connections aren't free - they're much more expensive than just serving the same script from your existing connection), reducing total requests to < 10, not serving 700kb hero images, 1.22MB embedded youtube players [1], 500kb of other js bloat, 200kb of webfonts, 150kb of boostrap css :/

the internet is faster than ever, browsers/javascript is faster than ever, cross-browser compat is better than ever, computers & servers are faster than ever, yet websites are slower than ever. i literally cannot consume the internet without uMatrix & uBlock Origin. and even with these i have to often give up my privacy by selectively allowing a bunch of required shit from third-party CDNs.

no website/SPA should take > 2s on a fast connection, (or > 4s on 3g) to be fully loaded. it's downright embarrassing. we can and must do better. we have everything we need today.

[1] https://s.ytimg.com/yts/jsbin/player-en_US-vfljAVcXG/base.js

chrisfosterelli|9 years ago

> stop using CDNs for jQuery/js libs, those https connections aren't free - they're much more expensive than just serving the same script from your existing connection

Do you have a source for this? My understanding is that, in real usage, it is cheaper to load common libraries from a CDN because in a public CDN (for something like jQuery), the library is likely to already be cached from another website and has a chance to even already have an SSL connection to the CDN.

Obviously 60 separate CDNs is excessive, but I don't know if the practice altogether is a bad idea.

Ntrails|9 years ago

> 1.22MB embedded youtube players

A guy who I used to post with wrote a new forum for us all to post on (woo splinter groups). It's pretty cool. One of the things it does is serve a static image of the underlying youtube and then load it on click. When a 'tube might be quoted 7 times on a page - that's a pretty useful trick.

I'd just assumed this was a standard forum feature and then I opened a "Music Megathread" on an ipboard and holy shit loading 30 youtube players was painful.

amelius|9 years ago

Actually I don't even read the articles anymore when on mobile. I just use HN, and hope somebody posts a TL;DR, or some relevant comment that gives some more information about the article. Only if this is not the case will I consider clicking on the article link. It's pretty sad actually.

I secretly wish there was some way that allows us (as a community) to collaboratively "pirate" articles, perhaps as a torrent (IPFS perhaps), so we only have to download the ascii text.

keypress|9 years ago

Back in the day we'd try and get page sizes down to less than 100k.

The Internet isn't fast for everyone. I (in the UK) have no 3G signal, let alone 4G and my broadband speed is pitiful - but it will do. There is nothing I can do to ramp the pipe speed up. I do end up turning off JS and images a lot of the time, because otherwise it ills me.

As a web dev, I don't care for bloat. So I find it particularly irksome, and currently it's enough to deter me from going mobile. Once I'd have dreamed about having a modern smartphone in my pocket with any Internet connection, but the friction currently today puts me off. The UK was recently slammed for its retrograde networks.

syphilis2|9 years ago

This is only slightly related, but I've noticed HN comment pages take a second or two to load when there are many comments (500+). Page sizes are not unreasonable (100-200 kb). What is the cause for these pages loading so slowly?

Example 900+ comment page: https://news.ycombinator.com/item?id=11116274

Example 2200+ comment page: https://news.ycombinator.com/item?id=12907201

roryisok|9 years ago

> literally cannot consume the internet without uMatrix & uBlock Origin

hear hear. and on mobile, its painful because I can't have those (windows phone at least). planning on buying a DD-WRT compatible router soon so I can do some kind of router level ad-blocking and let me browse on the phone again

PS: opera mobile for android has a built in adblocker

jiehong|9 years ago

It's more general than just webpages, but applies to everything under the name “Jevon's Paradox”(https://en.wikipedia.org/wiki/Jevons%27s_paradox).

Basically: more power -> more resources can be analysed in the same time, and not faster to answer.

hawski|9 years ago

I am with you, but I don't believe that this solution will be ever used by the majority of developers.

Browser caches should be bigger. They also should be more intelligent. It does not make sense to evict a library from cache if it is the most popular library used. Maybe having two buckets, one for popular libraries and another for the rest.

I think that it would help if script tag had a hash attribute. Then cache could become more efficient. But without the first part it would be useless. Example:

  <script src="https://cdn.example.com/jquery.js"
  sha256=18741e66ea59c430e9a8474dbaf52aa7372ef7ea2cf77580b37b2cfe0dcb3fd7>
  </script>

Or different syntax (whatever I'm not W3C):

  <script src="https://cdn.example.com/jquery.js">
  <hash>
    <sha256>
      18741e66ea59c430e9a8474dbaf52aa7372ef7ea2cf77580b37b2cfe0dcb3fd7
    </sha256>
  </hash>
  </script>

I would like to make an experiment, but as I am not experienced with webdev as much, it could take too much time for me. Test all major browsers with fresh install and default settings. Go to reddit or other links aggregator and load in sequence several links in same order on every browser. Check how efficiently cache was used. I would expect that after 10th site is loaded nothing would remain from the 1st one. Even though the same version of some library and maybe even link to CDN was used.

I am amazed how quickly fully static pages work even after I am on capped speed mobile connection (after I use 1.5 GB packet).

EDIT: The most helpful thing would be to have good dead-code removing compilers for JavaScript.

kup0|9 years ago

Yeah the problem really isn't the technology, it's our poor use of it.

jimlawruk|9 years ago

The use of CDNs is not primarily for speeding up page load times, but rather to offload bandwidth from the web server. Low budget Web sites don't always have money for server farms and are severely limited by how much bandwidth they can serve. One post to HN can take them down. Free CDNs are the poor man's approach to this problem.

Animats|9 years ago

Right. CNN now has over 50 trackers and other junk blocked by Ghostery.

EGreg|9 years ago

One gripe: the canonical CDNs may have the library already cached from your visits to other sites, which is faster.

I wish that web browsers would use content addressing to load stuff and do SRIs. If I already loaded a javascript file from another url, why load it again?

zeveb|9 years ago

This, this, a thousand times this.

There's a site I read, really like and financially support, but which has some pretty terrible slowness & UI issues. It's so bad that they've started a campaign recently to fix those issues. But when I check Privacy Badger, NoScript and uBlock, there's a reason that it's so terrible slow: they're loading huge amounts of JavaScript and what can only be called cruft.

Honestly, I think that they'd come out ahead of the game if they'd just serve static pages and have a fundraising drive semi-annually.

unknown|9 years ago

[deleted]

pjmlp|9 years ago

The browsers should have stayed HTML/CSS but then someone had this idea to try to compete with native applications....

lostboys67|9 years ago

Testify Testify Brother Testify!

tedunangst|9 years ago

So, uh, what does it do? I mean, I can't even tell if it's a server or client side change.

tyingq|9 years ago

The research paper[1] describes Polaris. Basically, you have to make large, sweeping changes to your html, server side. Instead of your original page + js references, you serve a bunch of javascript that then dynamically recreates your page on the client side in the most performant way that it can:

• The scheduler itself is just inline JavaScript code.

• The Scout dependency graph for the page is represented as a JavaScript variable inside the scheduler.

• DNS prefetch hints indicate to the browser that the scheduler will be contacting certain hostnames in the near future.

• Finally, the stub contains the page’s original HTML, which is broken into chunks as determined by Scout’s fine-grained dependency resolution

[1]http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf

naor2013|9 years ago

It's in between. It's a way for website developers to explain the dependencies for the files (in html or JavaScript or whatever) and a change needed for browsers to know how to use this new data to make better requests to the server.

GrumpyNl|9 years ago

Talk to some people in the porn industry. They will tell you how important fast pages are. You will also be surprised what they have done to achieve this.

sanxiyn|9 years ago

This sounds potentially interesting. Care to elaborate?

mikeytown2|9 years ago

From my experience preconnect is a big improvement for connecting to 3rd party domains. Will also mention that once you have JS deferred, CSS on a 3rd party domain (google fonts) can cause some major slowdowns in terms of the start rendering metrics if using HTTP/2 on a slow connection; all the bandwidth is used for the primary domain connection and not used for blocking resources, end result being images get downloaded before external blocking CSS.

jakeogh|9 years ago

The web is way better without JS. Rendering engines could in principal do the same (improved) dependency tracking.

kvz|9 years ago

I feel Webpack deserves a mention as it resolves the dependencies at build-time and compiles one (or a few chunked/entrybased) assets, hence also solving the problem of too many roundtrips

WhiteSource1|9 years ago

What are you looking to learn about a CDN?

There are many ways to accelerate page speed and, like everything else, it's a question of costs and benefits. For most things, some level of technical debt is OK and CDNs even for jQuery are good. Of course, good design and setting things up right is always the best - and the other question is where your site traffic comes from.

uaaa|9 years ago

Is there a comparison with Google AMP?

andrewguenther|9 years ago

Paper was released way before AMP. Also not really related to AMP, so I don't think it's necessary to draw a comparison.

Thiez|9 years ago

> What Polaris does is automatically track all of the interactions between objects, which can number in the thousands for a single page. For example, it notes when one object reads the data in another object, or updates a value in another object. It then uses its detailed log of these interactions to create a “dependency graph” for the page.

> Mickens offers the analogy of a travelling businessperson. When you visit one city, you sometimes discover more cities you have to visit before going home. If someone gave you the entire list of cities ahead of time, you could plan the fastest possible route. Without the list, though, you have to discover new cities as you go, which results in unnecessary zig-zagging between far-away cities.

What a terrible analogy. Finding a topological sorting is O(|V|+|E|), while the traveling salesman problem is NP-complete.

mfonda|9 years ago

He's not making a comparison to the traveling salesman problem; he's saying the businessperson only intended to visit one city, but the trip ended up requiring visits to several additional cities.

It's not a terrible analogy. You request an HTML page and you don't know until after you load it (visit the initial city) exactly what other resources--images, css, js, etc.--you'll need to download (additional cities to visit).

to3m|9 years ago

That's amusing, and I wonder if this particular analogy was chosen deliberately. But I don't think there's anything wrong with it - it's designed to make intuitive sense to non-programming readers, not to be some rigorous description that can be automatically translated into optimal code.