Founder of NuevoCloud here. If I read this right, you guys used Cloudflare for http 2. So let me ask you this, when you did your comparison, were all of the images cached (ie: x-cache: hit) at the edge?
The reason I ask is because cloudflare, last I checked, still hasn't implemented http2's client portion. So when a file is not cached, it does this:
Http2 is only used for the short hop between the client and edge node.. then the edge node uses http 1.1 for the connection to the origin server, which may be thousands of miles away.
In other words, in your test, depending on the client location and the origin server location.. your test may have used http 1.1 for the majority of the distance.
If you guys want to rerun this test on our network, we use http2 everywhere... your test would look like this on our network:
client <--http2--> edge node (closest to client) <--http2--> edge node (closest to server) <--http2--> origin server.
So even if your origin server doesn't support http2, it'll only use http 1.1 over the short hop between your server and the closest edge node.
You're welcome to email me if you want to discuss details you don't want to post here.
Edit: I should also mention, that we use multiple http 2 connections between our edge nodes and between the edgenode and origin server... removing that bottleneck. So only the client <--> edge node is a single http 2 connection.
To the best of my knowledge you are correct about how CloudFlare works. For context this data was collected over the period about a month on real production pages with significant traffic.
I did not do any real tests and I might be completely wrong etc. but it seems to me that http2 is going to perform poorly over wireless links like 3g.
With http1 one had N tcp connections, and with the way tcp slowly increases the bandwidth used, and rapidly decreases it when packet is lost, even if any packet were dropped (which will happen quite a lot on 3g) other tcp streams were not delayed, or blocked, and can even utilize the leftover bandwidth, yielded by the stream that lost the packet.
With http2 however there's one tcp connection, so dropped packets will cause under-utilization of the bandwidth. On top of that dropped packets will cause all frames after them, to be delayed in kernel receiving buffer until the dropped packet is retransmitted, while in http1 case they would be available at the app level right away.
HTTP2 being implemented on top of TCP always seemed like a weird choice. It should have been UDP, IMO. That's why network accelerators like PacketZoom make so much sense. Note: I work in PacketZoom, I did not do any in-depth research on HTTP2, and this is my opinion, not necessarily of the company.
This is a real worry, but as with all things the actual behaviour of H2 on lossy networks is more complex than that.
TCP's congestion control algorithms don't work that well when you have many TCP streams competing for the same bandwidth. This is because while packet loss is a property of the link, not an individual TCP stream, each packet loss event necessarily only affects one TCP stream. This means the others don't get the true feedback about the lossiness of the connection. This behaviour can lead to a situation where all of your TCP streams try to over-ramp.
A single stream generally behaves better on such a link: it's getting a much more complete picture of the world.
However, your HOL blocking concern is real. This is why QUIC is being worked on. In QUIC, each HTTP request/response is streamed independently over UDP, which gets the behaviour you're talking about here, while also maintaining an overall view of packet loss for rate limiting purposes.
I worry about TCP window scaling (and full TCP windows) when only using one TCP connection. There is a good reason download managers use multiple connections to download one file, because depending on the latency the maximum transfer rate is capped because only so many TCP packets can be in flight simultaneously. I wonder if nobody ever thought about that... HTTP1/x solved that (more by chance) with multiple connections...
Have you looked at QUIC? It seems like it addresses the problems with HTTP2 over TCP as well as providing some additional benefits like like speeding up the secure connection establishment.
I don't think that the server is in charge of priorisation here. The server can do it, but there is no reason to push this responsibility onto the server when the browser can do it much better (for example the server can't know what's in the viewport).
I expect this will be quickly sorted out by more mature HTTP/2 implementations in browsers. Downloading every image at once is obviously a bad idea, and I expect such naive behaviour will soon be replaced by decent heuristics (even just downloading eight resources at once should be better in nearly all cases)
I think the real solution hear is for the browser to be able to communicate some sort of priority to the server, without having to download a limited number of files at once.
One way to "solve" the time to visual completion would be to make all the images, but especially the larger images, progressive scan. For very large images, the difference in visual quality between 50% downloaded and 100% downloaded on most devices isn't noticeable, so the page would appear complete in half the time.
Totally. There are a bunch of ways to address the performance issue. As I alluded to at the end of the post there serious technology considerations when preprocessing so much image data.
We're currently looking at whether we can solve use IntersectionObserver for efficient lazy loading of images before the enter the viewport.
If there's a way to tell it not to render until x% downloaded, sure. Otherwise slower connections see the low-q versions for a while and it can disconcerting. Either to some users or some PMs.
OTOH, progressive JPEGs tend to require much more memory to decode. I do not have specific numbers to cite. Only going off of anecdotal usage of image programs over the years (e.g., Java photo uploaders that choked on progressive JPEGs).
There are discussions happening on how browsers can allow authors to resource prioritisation hints. I'm curious to see where it goes.
We'd ideally like to be able to say – "prioritise 10 images in the viewport". You hack it together relatively efficiently using IntersectionObserver now, but support isn't great.
CDNs are still going to have lower latency and higher bandwidth, and likely more ability to have long lived connections. Probably whatever mechanism develops to facilitate http/2 server pushed resources through a CDN will also include prioritization hints.
Did I read this right that http1 was with cdn A (unnamed?) and http2 was with cdn B (cloudflare)?
If so, you really can't draw any conclusions about the protocol difference when the pop locations, network designs, hardware and software configurations could easily have made the kinds of differences you're seeing.
By not moving our render blocking assets like CSS, JS and fonts over to the http/2 we rule out performance changes due to improvements to head of line block.
Our images were always on a separate hostname so the DNS lookup over is the same. We also did some initial benchmarking and found the new CDN to be more efficient than the old one.
Comparing two protocols using different providers, isn't that a bit comparing pears and apples? And i have a doubt, which could be bad assumption, but that it is on hardware you control or own and what exactly runs on it, and potentially which other parties use it.
Just now I finished separating the front-end and back-end - by a RESTful protocol - and this roughly halved performance compared to using a native library (from ~2000 payments/second on my laptop to ~1000). I expect HTTP/2 to make a greater percentage-wise difference here, although I admit I really have no idea how much, say, ZeroMQ would have reduced performance, compared to cutting it in half using HTTP/1.x.
I expect HTTP/2 to make a much greater difference in high performance applications, where overhead becomes more important, which static file serving doesn't really hit. So I think RESTful backend servers will see a much more noticeable performance increase, especially since, if you use Chrome at least, as an end-user you already get many of the HTTP/2 latency benefits through SPDY.
Useful related project: http://www.grpc.io/ is an excellent layer on top of HTTP2 for comms between backend services. It's from Google, and used by Docker and Square among others. It even comes with a rest-focused gateway https://github.com/grpc-ecosystem/grpc-gateway
- Serve less data. The best speedup is when there's no more data to download and if the throughput for clients is maxed out, then decreasing page weight helps.
- Use async bootstrap JS code to load in other scripts once images are done loading or other page load events have fired.
- Load less images in parallel, use JS to load one row of images at a time.
- Use HTTP/2 push (which CloudFlare offers) to push some of the images/assets with any other response. Push images with the original HTML and you'll start getting the images to browser before it even parses the HTML and starts any (prioritized) requests.
Wouldn't the standard solution of lazy loading images (and prioritizing critical css) help. Since they are now trying to load everything on a big page, they should only be trying to load everything above the fold.
We've recently moved to Google Cloud Storage from AWS because of http/2. We had a bottleneck of the browser waiting when serving multiple large (8+files * 10mb+each).
I'm wondering if 99designs looked at any sort of domain sharding to get around the timing issues. If I understand correctly, wouldn't this get around the priority queue issue? Your js,fonts, etc. coming from a different address than your larger images, would create completely separate connections.
I'm not completely sure this would get around the issues mentioned, but I'm curious if it was looked at as a solution.
The priority queue isn't the issue. In fact the priority queue is what kept our first paint times tanking because browsers prioritised render blocking resources instead of images.
The issue was due to the variance of image size. An image that is significantly larger than the page average will be loaded slower since all images get an equal share of bandwidth (priority). Adding sharding wouldn't help since the client only has a fixed amount of bandwidth to share and all images would still get the same share of it. Sharding could help if the bandwidth bottle neck was at the CDN but that's rarely going to be the case.
Domain sharding is anti pattern for http2. Reason being for another domain it needs to make an expensive TLS handshake. With http2 on the same domain, it doesn't. We've done tests and even moved away from even having a static domain.
IMHO HTTP/2 solves HOL blocking partly. It will allow other streams to proceed if one stream is blocked due to flow control (receiver doesn't read from the stream). E.g. if you have multiple parallel downloads over a single HTTP/2 connection one blocked/paused stream won't block the others.
However it doesn't have abilities that will allow individual streams to proceed if some packets are lost that only hold information for a single stream.
Thanks for posting your findings - very useful data. It would be interesting to see the Webpagetest waterfalls in greater detail if you're able to share that.
You planning to use your resource hints to enable server push at CDN edge?
Server push at the edge is problem atm. Current push semantics require the HTML document say which resources to push. That's an issue if you're serving assets off a CDN domain.
Asset domains make less since with h2 from a performance perspective but there are still security concerns that need to addressed.
[+] [-] rgbrenner|9 years ago|reply
The reason I ask is because cloudflare, last I checked, still hasn't implemented http2's client portion. So when a file is not cached, it does this:
client <--http2--> edge node <--http 1.1--> origin server.
Http2 is only used for the short hop between the client and edge node.. then the edge node uses http 1.1 for the connection to the origin server, which may be thousands of miles away.
In other words, in your test, depending on the client location and the origin server location.. your test may have used http 1.1 for the majority of the distance.
If you guys want to rerun this test on our network, we use http2 everywhere... your test would look like this on our network:
client <--http2--> edge node (closest to client) <--http2--> edge node (closest to server) <--http2--> origin server.
So even if your origin server doesn't support http2, it'll only use http 1.1 over the short hop between your server and the closest edge node.
You're welcome to email me if you want to discuss details you don't want to post here.
Edit: I should also mention, that we use multiple http 2 connections between our edge nodes and between the edgenode and origin server... removing that bottleneck. So only the client <--> edge node is a single http 2 connection.
[+] [-] xzyfer|9 years ago|reply
The edges were well and truely primed.
[+] [-] angry-hacker|9 years ago|reply
[+] [-] imaginenore|9 years ago|reply
[+] [-] dpc_pw|9 years ago|reply
With http1 one had N tcp connections, and with the way tcp slowly increases the bandwidth used, and rapidly decreases it when packet is lost, even if any packet were dropped (which will happen quite a lot on 3g) other tcp streams were not delayed, or blocked, and can even utilize the leftover bandwidth, yielded by the stream that lost the packet.
With http2 however there's one tcp connection, so dropped packets will cause under-utilization of the bandwidth. On top of that dropped packets will cause all frames after them, to be delayed in kernel receiving buffer until the dropped packet is retransmitted, while in http1 case they would be available at the app level right away.
HTTP2 being implemented on top of TCP always seemed like a weird choice. It should have been UDP, IMO. That's why network accelerators like PacketZoom make so much sense. Note: I work in PacketZoom, I did not do any in-depth research on HTTP2, and this is my opinion, not necessarily of the company.
[+] [-] Lukasa|9 years ago|reply
TCP's congestion control algorithms don't work that well when you have many TCP streams competing for the same bandwidth. This is because while packet loss is a property of the link, not an individual TCP stream, each packet loss event necessarily only affects one TCP stream. This means the others don't get the true feedback about the lossiness of the connection. This behaviour can lead to a situation where all of your TCP streams try to over-ramp.
A single stream generally behaves better on such a link: it's getting a much more complete picture of the world.
However, your HOL blocking concern is real. This is why QUIC is being worked on. In QUIC, each HTTP request/response is streamed independently over UDP, which gets the behaviour you're talking about here, while also maintaining an overall view of packet loss for rate limiting purposes.
[+] [-] king_phil|9 years ago|reply
[+] [-] amock|9 years ago|reply
[+] [-] xzyfer|9 years ago|reply
[+] [-] xzyfer|9 years ago|reply
[+] [-] krschultz|9 years ago|reply
[+] [-] wongarsu|9 years ago|reply
I expect this will be quickly sorted out by more mature HTTP/2 implementations in browsers. Downloading every image at once is obviously a bad idea, and I expect such naive behaviour will soon be replaced by decent heuristics (even just downloading eight resources at once should be better in nearly all cases)
[+] [-] foota|9 years ago|reply
[+] [-] joobus|9 years ago|reply
[+] [-] xzyfer|9 years ago|reply
We're currently looking at whether we can solve use IntersectionObserver for efficient lazy loading of images before the enter the viewport.
[+] [-] pornel|9 years ago|reply
https://www.youtube.com/watch?v=66JINbkBYqw
[+] [-] MichaelGG|9 years ago|reply
[+] [-] b34r|9 years ago|reply
[+] [-] ashmud|9 years ago|reply
[+] [-] mkj|9 years ago|reply
[+] [-] xzyfer|9 years ago|reply
We'd ideally like to be able to say – "prioritise 10 images in the viewport". You hack it together relatively efficiently using IntersectionObserver now, but support isn't great.
[+] [-] toast0|9 years ago|reply
[+] [-] hendry|9 years ago|reply
Starting to wish Appcache manifest actually was made to work and that could use used as a queue somehow to prioritise important assets on a Webpage.
[+] [-] cagenut|9 years ago|reply
If so, you really can't draw any conclusions about the protocol difference when the pop locations, network designs, hardware and software configurations could easily have made the kinds of differences you're seeing.
[+] [-] xzyfer|9 years ago|reply
By not moving our render blocking assets like CSS, JS and fonts over to the http/2 we rule out performance changes due to improvements to head of line block.
Our images were always on a separate hostname so the DNS lookup over is the same. We also did some initial benchmarking and found the new CDN to be more efficient than the old one.
[+] [-] thinkMOAR|9 years ago|reply
[+] [-] jjcm|9 years ago|reply
[+] [-] xzyfer|9 years ago|reply
[+] [-] runeks|9 years ago|reply
Just now I finished separating the front-end and back-end - by a RESTful protocol - and this roughly halved performance compared to using a native library (from ~2000 payments/second on my laptop to ~1000). I expect HTTP/2 to make a greater percentage-wise difference here, although I admit I really have no idea how much, say, ZeroMQ would have reduced performance, compared to cutting it in half using HTTP/1.x.
I expect HTTP/2 to make a much greater difference in high performance applications, where overhead becomes more important, which static file serving doesn't really hit. So I think RESTful backend servers will see a much more noticeable performance increase, especially since, if you use Chrome at least, as an end-user you already get many of the HTTP/2 latency benefits through SPDY.
[+] [-] e1g|9 years ago|reply
[+] [-] arca_vorago|9 years ago|reply
Would that be a good case for websockets?
[+] [-] manigandham|9 years ago|reply
- Serve less data. The best speedup is when there's no more data to download and if the throughput for clients is maxed out, then decreasing page weight helps.
- Use async bootstrap JS code to load in other scripts once images are done loading or other page load events have fired.
- Load less images in parallel, use JS to load one row of images at a time.
- Use HTTP/2 push (which CloudFlare offers) to push some of the images/assets with any other response. Push images with the original HTML and you'll start getting the images to browser before it even parses the HTML and starts any (prioritized) requests.
[+] [-] dalore|9 years ago|reply
[+] [-] pedalpete|9 years ago|reply
I'm wondering if 99designs looked at any sort of domain sharding to get around the timing issues. If I understand correctly, wouldn't this get around the priority queue issue? Your js,fonts, etc. coming from a different address than your larger images, would create completely separate connections.
I'm not completely sure this would get around the issues mentioned, but I'm curious if it was looked at as a solution.
[+] [-] xzyfer|9 years ago|reply
The issue was due to the variance of image size. An image that is significantly larger than the page average will be loaded slower since all images get an equal share of bandwidth (priority). Adding sharding wouldn't help since the client only has a fixed amount of bandwidth to share and all images would still get the same share of it. Sharding could help if the bandwidth bottle neck was at the CDN but that's rarely going to be the case.
[+] [-] dalore|9 years ago|reply
[+] [-] diegorbaquero|9 years ago|reply
Hopefully we'll see a follow up with future changes and tweaks both from webservers and browsers.
[+] [-] xzyfer|9 years ago|reply
[+] [-] muteor|9 years ago|reply
From the project page:
Key features of QUIC over existing TCP+TLS+HTTP2 include
* Dramatically reduced connection establishment time
* Improved congestion control
* Multiplexing without head of line blocking
* Forward error correction
* Connection migration
[+] [-] Matthias247|9 years ago|reply
However it doesn't have abilities that will allow individual streams to proceed if some packets are lost that only hold information for a single stream.
[+] [-] noahcollins|9 years ago|reply
You planning to use your resource hints to enable server push at CDN edge?
[+] [-] xzyfer|9 years ago|reply
Asset domains make less since with h2 from a performance perspective but there are still security concerns that need to addressed.
[+] [-] esher|9 years ago|reply
about the same result: real world performance boost was not soooo big.
[+] [-] schallertd|9 years ago|reply
[+] [-] secure|9 years ago|reply
Given that jessie is stable, OpenSSL will not be updated to a newer version, only security updates will be made available.
What I’m trying to say: you’re never going to get it on jessie, unless you enable backports, at which point you’ll have it readily available.
Hope that helps
[+] [-] mmel|9 years ago|reply
[+] [-] benschwarz|9 years ago|reply
[+] [-] xzyfer|9 years ago|reply
[deleted]