The web at maximum FPS: How WebRender gets rid of jank

[+] pacaro|8 years ago|reply

Humourously enough, when I worked on a team that was writing a graphical web browser for mobile in the late 90's [1], they used a display list for rendering.

The reasoning was somewhat different, web pages were essentially static (we didn't do "DHTML"), if the page rendering process could generate an efficient display list, then the page source could be discarded, and only the display list needed to be held in memory, this rendering could then be pipelined with reading the page over the network, so the entire page was never in memory.

Full Disclosure: while I later wrote significant components of this browser (EcmaScript, WmlScript, SSL, WTLS, JPEG, PNG), the work I'm describing was entirely done by other people!

[1] - I joined in 97, the first public demo was at GSM World Congress Feb 98

[+] DonHopkins|8 years ago|reply

I developed a hypermedia browser called HyperTIES for NeWS that was scriptable in FORTH, and whose formatter could output FORTH code to layout and paginate an article for a particular screen size, which could be saved out in a binary FORTH image that you could restart quickly.

The FORTH code then downloaded PostScript code into the NeWS server, where it would be executed in the server to draw the page.

It even had an Emacs interface written in Mocklisp!

http://www.donhopkins.com/home/archive/HyperTIES/ties.doc.tx...

http://www.donhopkins.com/home/images/HyperTIESDiagram.jpg http://www.donhopkins.com/home/images/HyperTIESAuthoring.jpg

http://www.donhopkins.com/drupal/node/101 http://www.donhopkins.com/drupal/node/102

http://www.donhopkins.com/home/ties/ http://www.donhopkins.com/home/ties/fmt.f http://www.donhopkins.com/home/ties/fmt.c http://www.donhopkins.com/home/ties/fmt.cps http://www.donhopkins.com/home/ties/fmt.ps http://www.donhopkins.com/home/ties/ties-2.ml

[+] AceJohnny2|8 years ago|reply

Opera?

[+] pohl|8 years ago|reply

Now that this is closer to shipping, I'm curious what impact this would have on battery life. On the one hand, this is lighting up more silicon; on the other hand: a faster race to sleep, perhaps?

Have there been any measurements on what the end result is on a typical modern laptop?

[+] anon1253|8 years ago|reply

Just tried it with the Nightly by setting gfx.webrender.enabled to true in about:config. Wow, that thing flies. It's seriously amazing. And so far no bugs or visual inconsistencies I could detect. Firefox is really making great progress on this front!

[+] 482794793792894|8 years ago|reply

There's more steps necessary to enable WebRender in full capacity.

I presume, though, that things are buggier then and the potentially introduced performance drops might actually make it feel slower for now. I don't know, though, I haven't tested it with just gfx.webrender.enabled.

You can find the current full list of steps to enable WebRender here: https://mozillagfx.wordpress.com/2017/09/25/webrender-newsle...

[+] mycoborea|8 years ago|reply

Yeah I'm pretty stunned at the speed improvements. I was getting a little worried there after Australis.

[+] Antrikshy|8 years ago|reply

I really want to use Firefox full time, but I miss Safari's multi-touch gesture integration.

[+] jacob019|8 years ago|reply

Love the little stick figure representations of the threads/cores.

[+] Vinnl|8 years ago|reply

That's to Lin Clark's credit.

[+] vvanders|8 years ago|reply

Good stuff.

Speaking of rendering text glyphs on the GPU, there's a really clever trick(commonly called loop-blinn, from the two authors): https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch25....

You can pretty much just use the existing bezier control points from TTF as-is which is really nice.

[+] pcwalton|8 years ago|reply

If only it were as simple as just using Loop-Blinn. :) The technique described there will produce unacceptably bad antialiasing for body text. Loop-Blinn is fine if you want fast rendering with medium quality antialiasing, though. (Incidentally, it's better to just use supersampling or MLAA-style antialiasing with Loop-Blinn and not try to do the fancy shader-based AA described in that article.)

Additionally, the original Loop-Blinn technique uses a constrained Delaunay triangulation to produce the mesh, which is too expensive (O(n^3) IIRC) to compute in real time. You need a faster technique, which is really tricky because it has to preserve curves (splitting when convex hulls intersect) and deal with self-intersection. Most of the work in Pathfinder 2 has gone into optimizing this step. In practice people usually use the stencil buffer to compute the fill rule, which hurts performance as it effectively computes the winding number from scratch for each pixel.

The good news is that it's quite possible to render glyphs quickly and with excellent antialiasing on the GPU using other techniques. There's lots of miscellaneous engineering work to do, but I'm pretty confident in Pathfinder's approach these days.

[+] Animats|8 years ago|reply

Do they generate those triangles for each instance of each glyph, or only once for each character in the font?

[+] kidfiji|8 years ago|reply

I just love how they simplify something that's seemingly esoteric to me. I work with the web and I still have a lot to learn.

[+] winter_blue|8 years ago|reply

A lot of credit for that goes towards Lin Clark, who's done some amazing expository articles like this before.

[+] bsimpson|8 years ago|reply

Great write-up - thanks for sharing.

The name "WebRender" is unfortunate though. Things with a "Web" prefix - "Web Animations", "WebAssembly", "WebVR" - are typically cross-browser standards. This is just a new approach Firefox is using for rendering. It doesn't appear to be part of any standard.

[+] rileyphone|8 years ago|reply

If Webkit did it, why can't they?

[+] 482794793792894|8 years ago|reply

I remember reading at some point that WebRender could actually be isolated relatively easily and then applied to basically any browser. That sort of already took place, going from Servo over into Gecko.

So, it might actually turn into somewhat of a pseudo-standard.

[+] djhworld|8 years ago|reply

This is fantastic (for me)

I'd largely forgotten what pixel shaders actually were, so it was nice to get a high level understanding through this article, especially with the drawings!

[+] frostwhale|8 years ago|reply

I was already extremely pleased with the Firefox Quantum beta, they really are stepping their game up. If this is truly as clean as they say it is, web browsing on cheap computers just got much smoother.

[+] stevenhubertron|8 years ago|reply

I really appreciate the time they are taking to describe the changes in an easy to understand way. The sketches and graphics really help explain a pretty complex subject.

[+] shmerl|8 years ago|reply

Is it going to use Vulkan? Sounds like a good fit for proper parallelized rendering.

UPDATE: Ah, I see it's mentioned in the future work: https://github.com/servo/webrender/wiki#future-work

    Vulkan?
    This could possibly make some of the serial
    steps above able to be parallelized further.

So it will be using OpenGL then?

[+] 482794793792894|8 years ago|reply

Vulkan has been a consideration from the earliest architecturing steps done in WebRender. So, the internal pipelines are all set up to be mapped to Vulkan's pipelines.

It's actually OpenGL which fits less into the architecture, but it's still easier to just bundle WebRender's pipelines all together and then throw that into OpenGL.

[+] larsberg|8 years ago|reply

The awesome folks at Szeged University have been working with our team on both Vulkan and native DX11 backends!

[+] kevindqc|8 years ago|reply

>For a typical desktop PC, you want to have 100 draw calls or fewer per frame

Don't PC games use thousands of draw calls per frame?

[+] pcwalton|8 years ago|reply

They do, but we're targeting Intel HD quality graphics, not gaming-oriented NVIDIA and AMD GPUs.

That said, even Intel GPUs can often deal with large numbers of draw calls just fine. It's mobile where they become a real issue.

Aggressive batching is still important to take maximum advantage of parallelism. If you're switching shaders for every rect you draw, then you frequently lose to the CPU.

[+] simlevesque|8 years ago|reply

Desktop PCs don't play games.

[+] amelius|8 years ago|reply

> What if we stopped trying to guess what layers we need? What if we removed this boundary between painting and compositing and just went back to painting every pixel on every frame?

This feels a bit like cheating. Not all devices have a GPU. Would Firefox be slow on those devices?

Also, pages can become arbitrarily complicated. This means that an approach where compositing is used can still be faster in certain circumstances.

[+] metajack|8 years ago|reply

To address your second point, you seem to be saying that missing the frame budget once and then compositing the rest of the time would be better than missing the frame budget every time.

That is certainly true, but a) the cases where you can do everything as a compositor optimization are very few (transform and opacity mostly) so aside from a few fast paths you'd miss your frame budget all the time there too, and b) we have a lot of examples of web pages that are slow on CPU renderers and very fast on WebRender and very few examples of the opposite aside from constructed edge case benchmarks. Those we have found had solutions and I suspect the other cases will too.

As resolution and framerate scale, CPUs cannot keep up. GPUs are the only practical path forward.

[+] aneutron|8 years ago|reply

Actually, virtually every device the average grade consumer uses, has a GPU. For instance, even Atom processors have GPUs. Granted, they don't have as much cores as a full-fledged nVidia GPU, nor as much dedicated memory, but they are still GPU with several tens of cores and specialized APIs that were designed specifically for the tasks at hand. Plus, they offload (ish) the CPU.

[+] fritzy|8 years ago|reply

I imagine the render task tree also has to determine which intermediate textures to keep in the texture cache, and which ones will likely need to be redone in the next frame. That kind of optimization has to be tricky.

[+] pcwalton|8 years ago|reply

Yeah, it's one of the two hard problems in computer science after all :)

In practice LRU caches work pretty well.

[+] azinman2|8 years ago|reply

Won’t this cause a lot of work to be done for a blinking cursor? Curious about battery drain, I/O overhead, General CPU usage, etc.

[+] pcwalton|8 years ago|reply

With a compositor you're already drawing every pixel every frame on the GPU, whether it's just a cursor blinking or not. The WR approach basically only adds a negligible amount of vertex shading time.

[+] poizan42|8 years ago|reply

I tried testing it out on a ThinkPad T61 to see how well it works with an older embedded GPU (Intel 965 Express), but I can't enable it (on Windows 10) because D3D11 compositing is disabled, it says D3D11_COMPOSITING: Blocklisted; failure code BLOCKLIST_

So does that mean that it is known not to work with that GPU? Can you override the blocklist to see what happens?

Edit: It also says:

> Direct2D: Blocked for your graphics driver version mismatch between registry and DLL.

and

> CP+[GFX1-]: Mismatched driver versions between the registry 8.15.10.2697 and DLL(s) 8.14.10.2697, reported.

Indeed that is correct, the driver is marked as version 8.15.10.2697 but the fileversion of the dlls are 8.14.10.2697, this seems to be intentional by Microsoft or Intel, note that the build numbers are still the same. Firefox is quite naive if it thinks it can just try to match those.

[+] JepZ|8 years ago|reply

While I would consider myself more a Golang fan than a Rust fan, I am impressed by the speed by which the Mozilla team is changing fundamental parts of their browser and somehow I believe rust has something to do with that speed.

[+] pimeys|8 years ago|reply

I've been working professionally with Rust for a year now. When I got over the first wall, it has become the best tool I've had for creating backend applications. I have history with at least nine different languages during my professional career, but nothing comes close giving the confidence and ergonomics than the tools Rust ecosystem provides.

Firefox, especially the new Quantum version is awesome. But Rust as a side product might be the best thing Mozilla brought us. I'm truly thankful for that.

[+] markdog12|8 years ago|reply

Is WebRender working on Android Firefox Nightly yet?

Update: about:support says not ready for Android

[+] Brakenshire|8 years ago|reply

I asked this on a thread a few weeks back, if I recall they're targeting 59 for Android.

[+] aneutron|8 years ago|reply

On nightly, the flag is available in about:config and I alreay enabled it.

[+] esaym|8 years ago|reply

Will there finally be a unified use of the GPU on all platforms (win, mac, linux, etc) or will WebRender just be a Windows only feature for quite some time?

[+] hexane360|8 years ago|reply

I have WebRender working on Linux with Intel 5500 integrated graphics. Hardware acceleration is still a bit glitchy though I'm afraid (with or without WebRender).

To enable, toggle 'layers.acceleration.force-enabled' as well as 'gfx.webrender.enabled'

edit: It's also working through my Nvidia 950m (through bumblebee), although subjectively it seems to have a little more lag this way.

[+] madez|8 years ago|reply

Why are they so obsessed with 60 fps? 120 fps looks considerably better, and there are other effects like smear and judder that significantly decrease even with significantly higher frame rates, say 480 fps [1].

[1] http://blogs.valvesoftware.com/abrash/down-the-vr-rabbit-hol...

[+] kibwen|8 years ago|reply

The WebRender folks are well aware that higher framerates are the future. Here's a tweet from Jack Moffitt today, a Servo engineer (and Servo's technical lead, I believe): https://twitter.com/metajack/status/917784559143522306

"People talk about 60fps like it's the end game, but VR needs 90fps, and Apple is at 120. Resolution also increasing. GPUs are the only way. Servo can't just speed up today's web for today's machines. We have to build scalable solutions that can solve tomorrow's problems."

[+] sturmen|8 years ago|reply

As everyone said, 60fps is not the destination but merely a waypoint. It's a good goal, considering 99% of screens that are in use today refresh at 60 Hz or their regional equivalent. Higher refresh rates are next.

[+] aneutron|8 years ago|reply

Not an expert, but I feel that that was more of an analogy/image to give what they were aiming for. The real objective is not 60fps, the real objective is to use the GPU to do tasks that it was designed for. Plain and simple. This however, gives the user a smoother experience, and 60 fps generally gives a noticeable difference.

[+] metajack|8 years ago|reply

We're not as other people have said in other comments. On normal content you can often see WebRender hit 200+ fps if you don't lock it to the frame rate. To see this for yourself, run Servo on something with -Z wr-stats which will show you a performance overlay.

[+] 482794793792894|8 years ago|reply

I don't think, they are obsessed with 60 FPS, that's just what for most people is synonymous to a smooth experience and is often not met by browsers at this point in time.

Here's for example an early demo showing Wikipedia at ridiculous frames per second (starts at 0:26:00): https://air.mozilla.org/bay-area-rust-meetup-february-2016/

In the video, he says 500 FPS, but assuming there's no more complicated formula behind this, I think it would actually be 2174 FPS. (0.46 ms GPU time per frame -> 1/0.00046s = 2173.913 FPS)

[+] stcredzero|8 years ago|reply

Why are they so obsessed with 60 fps?

Baby steps.

[+] to3m|8 years ago|reply

I don't think 120Hz is as common as 144Hz. But the vast majority of displays update at 60Hz, so you might as well aim for that.

207 comments