This is essentially how Pathfinder works, with 16x16 tiles instead of 32x32. It also has a pure-GPU mode that does tile setup on GPU instead of CPU, which is a nice performance boost, though a lot more complex. Note that if you're doing the setup on CPU, the work can be parallelized across cores and I highly recommend this for large scenes. The main difference, which isn't large, is that Pathfinder draws the tiles directly instead of drawing a large canvas-size quad and doing the lookup in the fragment shader.
When I originally checked, Slug works in a similar way but doesn't do tiling, so it has to process more edges per scanline than Pathfinder or piet-gpu. Slug has gone through a lot of versions since then though and I wouldn't be surprised if they added tiling later.
Would you recommend Pathfinder for real world use? Of course, I know that you're no longer working on it but would like to know if there any significant bugs/drawbacks in using it. For context, I'm coding a simple vector graphics app that needs to resize and render quite complex 2d polycurves in real time. So far, the only thing I found working was Skia which is good but not fast enough to do the stuff I need in real time (at least on low end devices).
Tiling doesn't work too well under domain transforms--3d environments, dynamic zoom, etc. That's why I am betting on high-quality space partitioning. Slug space partitioning is not amazing; I believe it still processes O(n) curves per fragment in a horizontal band.
A post about vector graphics and the word "stroke" appears zero times ...
> Much better approach for vector graphics is analytic anti-aliasing. And it turns out, it is not just almost as fast as a rasterizer with no anti-aliasing, but also has much better quality than what can be practically achieved with supersampling.
> Go over all segments in shape and compute area and cover values continuously adding them to alpha value.
This is approach is called "coverage to alpha". The author will be surprised to learn about the problem of coverage to alpha conflation artifacts. E.g. if you draw two shapes of exactly the same geometry over each other, but with different colors. The correct result includes only the color of the last shape, but with "coverage to alpha" you will get a bit of color bleeding around the edges (the conflation artifacts). I think Guzba also gives other examples in this thread.
Also, they did not mention the other hard problems like stroke offsetting, nested clipping and group opacity etc.
This seems like a silly way to do vector graphics in a shader.
What I've done in the past is representing the shape as a https://en.wikipedia.org/wiki/Signed_distance_function to allow each pixel to figure out if it is inside or outside the shape. This avoids the need to figure out the winding.
Anti-aliasing is implemented as a linear interpolation for values near zero. This also allows you to control the "thickness" of the shape boundary. The edge become mores blurry if you increase the lerp length.
Signed distance fields only work well for relatively simple graphics.
If you have highly detailed characters like Chinese or emojis, you need larger resolution to faithfully represent every detail. The problem is that SDFs are sampled uniformly over the pixel grid. If the character is locally complex, a high resolution is required to display it, but if the character has simple flat regions, memory is wasted. One way to get around excessive memory requirements is to store the characters in their default vector forms and only render the subset of required characters on demand, but then you might as well render them at the required pixel resolution and do away with the additional complexity of SDF rendering.
SDF is cool but not a generally good solution to GPU vector graphics. It only works for moderate scaling up before looking bad, the CPU prep figuring out the data the GPU needs takes far longer than just rasterizing on CPU would, etc. It's great as a model for games where there are many renders as world position changes but that's about it.
I don't know the details of using SDF (especially MSDF!) for doing vector graphics, but my understanding is that essentially it's a precomputation that involves _already_ a rasterization.
I would like to know why you think the described approach is silly? It doesn't involve a final rasterization but merely a prefiltering of segments.
I did vector graphics using SDFs in this library (https://github.com/audulus/vger). Works pretty well for my uses which are rendering dynamic UIs, not rendering SVGs. But I can still do some pretty gnarly path fills!
Here's the approach for rendering path fills. From the readme:
> The bezier path fill case is somewhat original. To avoid having to solve quadratic equations (which has numerical issues), the fragment function uses a sort-of reverse Loop-Blinn. To determine if a point is inside or outside, vger tests against the lines formed between the endpoints of each bezier curve, flipping inside/outside for each intersection with a +x ray from the point. Then vger tests the point against the area between the bezier segment and the line, flipping inside/outside again if inside. This avoids the pre-computation of Loop-Blinn, and the AA issues of Kokojima.
It works pretty well, and doesn't require as much preprocessing as the code in the article. Also doesn't require any GPU compute (though I do use GPU compute for some things). I think ultimately the approach in the article (essentially Piet-metal, aka tessellating and binning into tiles) will deliver better performance, and support more primitives, but at greater implementation complexity. I've tried the Piet-metal approach myself and it's tricky! I like the simpler Shadertoy/SDF inspired approach :)
I don't want to be the guy that doesn't read the entire article, but the first sentence surprised me quite a bit:
> Despite vector graphics being used in every computer with a screen connected to it, rendering of vector shapes and text is still mostly a task for the CPU.
Do modern vector libraries really not use the GPU? One of the very first things I did when learning Vulkan was to use a fragment shader to draw a circle inside a square polygon. I always assumed that we've been using the GPU for pretty much any sort of vector rasterization, whether it was bezier curves or font rendering.
SVG paths can be arbitrarily complex. This article really doesn't discuss any of the actual hard cases. For example, imagine the character S rotated 1 degree and added to the path on top of itself in a full rotation. This is one path composed of 360 shapes. These begin and end fill sections (winding order changes) coincide in the same pixels at arbitrary angles (and the order of the hits is not automatically sorted!) but the final color cannot be arrived at correctly if you do not process all of the shapes at the same time. If you do them one at a time, you'll blend tiny (perhaps rounded to zero) bits of color and end up with a mess that looks nothing like what it should. These are often called conflation artifacts IIRC.
There's way more to this than drawing circles and rectangles, and these hard cases are why much of path / vector graphics filling still ends up being better on CPU where you can accumulate, sort, etc which takes a lot of the work away. CPU does basically per-Y whereas this is GPU per-pixel so perhaps they're almost equal if the GPU has the square of a CPU power. Obv this isn't quite right but gives you an idea.
Video discussing path filling on CPU (super sampling and trapezoid): https://youtu.be/Did21OYIrGI?t=318 We don't talk about the complex cases but this at least may help explain the simple stuff on CPU for those curious.
Skia mostly uses the CPU -- it can draw some very basic stuff on the GPU, but text and curves are a CPU fallback. Quartz 2D is full CPU. cairo never got an acceptable GPU path. Direct2D is the tessellate-to-triangle approach. If you name a random vector graphics library, chances are 99% of the time it will be using the CPU.
3D vector graphics are not as full featured as 2d vector graphics.
2d vector graphics include things like "bones" and "tweening", which are CPU algorithms. (Much like how bone processing in 3d world is also CPU-side processing).
---------
Consider the creation of a Beizer curve, in 2d or 3d. Do you expect this to be a CPU algorithm, or GPU algorithm? Answer: clearly a CPU algorithm.
GPU algorithms generally are triangle-only, or close to it (ex: quads) as far as geometry. Sure, there are geometry shaders, but I don't think its common practice to take a Beizer Curve definition and write a Tesselator-shader for it and output (in parallel) a set of verticies. (And if someone is doing that, I'm interested in heading / learning more about it. It seems like a parallelizable algorithm to me but the devil is always in the details...).
A few of the previous approaches are mentioned in Other work near the end. And from reading a few articles on the topic I got the impression that, yes, drawing a single shape in a shader seems almost trivial, vector graphics in general means mostly what PostScript/PDF/SVG are capable of these days. This means you don't just need filled shapes, but also strokes (and stroking in itself is a quite complicated problem), including dashed lines, line caps, etc. Gradients, image fills, blending modes are probably on the more trivial end, since I think those can all be easily solved in shaders.
There's definitely a lot of code out there that still does this only on the CPU, but the optimized implementations used in modern OSes, browsers and games won't.
The best GPU vector rendering library I have seen is https://sluglibrary.com/. The principal use case is fonts, but it appears that the underlying mechanism can be used for any vector graphics.
I think the issue with slug is that it requires a fair amount of pre-computation. So it's great for its use case: rendering glyphs, especially on surfaces in games.
A possibly dumb question. GPUs are really, really good at rendering triangles. Millions of triangles per second good. Why not convert a vector path into a fine enough mesh of triangles/vertexes and make the GPU do all the rasterization from start to finish instead of doing it yourself in a pixel shader?
You can do that, except now you've moved the bulk of the work from the GPU to the CPU -- triangulation is tricky to parallelize. And GPUs are best at rendering large triangles -- small triangles are much trickier since you risk overdraw issues.
Also, typical GPU triangle antialiasing like MSAAx16 only gives you 16 sample levels, which is far from the quality we want out of fonts and 2D shapes. We don't have textures inside the triangles in 2D like we do in 3D, so the quality of the silhouette matters far more.
That said, this is what Direct2D does for everything except text.
8 or 9 years ago I had need to rasterize SVG for a program I had written back then and looked into gpu vs cpu, but a software rasterizer ended up being fast enough for my needs and was simpler, so I didn't dig any further.
At the time I looked at an nvidia rendering extension, which was described in this 2012 paper:
In addition to the paper, the linked page has links to a number of youtube demos. That was 10 years ago, so I have no idea if that is still a good way to do it or if it has been superseded.
There’s a OpenGL-ish gpu graphics library (who’s name I can’t currently remember) that’s in mesa, but not built by default in most distros, and IIRC is also supported on raspberrypi.
I played with it a bit, wrote a python wrapper for it, borked a fedora install trying to get real gpu support, fun times all around. Seems nobody cares about an accelerated vector graphics library.
Not exactly - the article you link to is about SVG/CSS filters, not path drawing. Modern Chrome (skia) supports accelerated path drawing but only some of the work is offloaded to the GPU. In even older Chrome the GPU was used for compositing bitmaps of already-rendered layers.
I’m glad it seems more and more people are looking into rendering vector graphics on the GPU.
Has anyone done any image comparisons between CPU vs GPU rendering. I would be worried about potential quality and rendering issues of a GPU rendered image vs a CPU rendered reference image.
The interesting primitives are: add mul fma sqrt. All of these are mandated by ieee-754 to be correctly rounded. While gpus have been found to cut corners in the past, I wouldn't worry too much about it.
Shouldn't a GPU render (given a correct algorithm implementation) be more correct in environments where zooming and sub-pixel movements are common (eg. browsers)? The GPU runs the mathemarical computations every frame for the exact pixel dimensions while the CPU may often use techniques like upscaling.
There's nothing to worry about. You can do the same things on the GPU as on the CPU. The tricky part is finding a good way to distribute the work on many small cores.
> a very good performance optimization is not to try to reject segments on the X axis in the shader. Rejecting segments which are below or above current pixel boundaries is fine. Rejecting segments which are to the right of the current pixel will most likely increase shader execution time.
Thread groups are generally rectangular IME--nv is 8x4, others 8x8. So it doesn't make sense to distinguish X from Y in this respect. But yes, you do want a strategy for dealing with 'branch mispredictions'. Buffering works, and is applicable to the cpu too.
Is there no built in GPU path draw command? Seems like it would be similar (although not identical) to what the GPU does for vertices visibility.
Especially when you consider what tile based renderers do for determining whether a triangle fully covers a tile (allowing rejection of any other draw onto that tile) it seems like GPUs could have built in support for 'inside a path or outside a path.' Even just approximating with triangles as a pre-pass seems faster than the row based method in the post.
Are arbitrary paths just too complex for this kind of optimization?
From my understanding there is no closed form solution to arbitrary paths defined in that way. So the only way to figure out what the shape looks like, and to figure out if a point is inside or outside, you would need to run all the commands that form the path.
> ... seems faster than the row based method in the post.
But the row-based method in the post is not what they describe doing on the GPU version of the algorithm. The row-based method is their initial CPU-style version.
The GPU version handles each pixel in isolation, checking it against the relevant shape(s).
At least, if I understand things correctly (:
As far as I can tell, the approach described here is probably similar to what a built-in "draw path" command would do. Checking if something is inside a triangle is just extremely easy (no concavity, for instance) and common, and more complex operations are left up to shader developers — why burn that special-case stuff into silicon?
What I don't understand - why is there this "cover table" with precomputed per-cell-row coverage? I.e. why is the cover table computed per-cell-row when the segments are specialized per-cell? There is the paper "ravg.pdf" that gets by with basically one "cover" integer per tile, plus some artificial fixup segments that I believe are needed even in the presence of such a cover table. I'm probably missing something, someone who is deeper in the topic please enlighten me?
Does this mean it is mostly done in the fragment shader and there is no tessellation, like how Bezier patches are rendered in 3D land? That's quite different from what I thought I knew.
[+] [-] pcwalton|3 years ago|reply
When I originally checked, Slug works in a similar way but doesn't do tiling, so it has to process more edges per scanline than Pathfinder or piet-gpu. Slug has gone through a lot of versions since then though and I wouldn't be surprised if they added tiling later.
[+] [-] coffeeaddict1|3 years ago|reply
[+] [-] moonchild|3 years ago|reply
Tiling doesn't work too well under domain transforms--3d environments, dynamic zoom, etc. That's why I am betting on high-quality space partitioning. Slug space partitioning is not amazing; I believe it still processes O(n) curves per fragment in a horizontal band.
[+] [-] Lichtso|3 years ago|reply
> Much better approach for vector graphics is analytic anti-aliasing. And it turns out, it is not just almost as fast as a rasterizer with no anti-aliasing, but also has much better quality than what can be practically achieved with supersampling.
> Go over all segments in shape and compute area and cover values continuously adding them to alpha value.
This is approach is called "coverage to alpha". The author will be surprised to learn about the problem of coverage to alpha conflation artifacts. E.g. if you draw two shapes of exactly the same geometry over each other, but with different colors. The correct result includes only the color of the last shape, but with "coverage to alpha" you will get a bit of color bleeding around the edges (the conflation artifacts). I think Guzba also gives other examples in this thread.
Also, they did not mention the other hard problems like stroke offsetting, nested clipping and group opacity etc.
Here, IMO a good blog post about the hard problems of getting alpha blending and coverage right: https://ciechanow.ski/alpha-compositing/
[+] [-] pyrolistical|3 years ago|reply
What I've done in the past is representing the shape as a https://en.wikipedia.org/wiki/Signed_distance_function to allow each pixel to figure out if it is inside or outside the shape. This avoids the need to figure out the winding.
Anti-aliasing is implemented as a linear interpolation for values near zero. This also allows you to control the "thickness" of the shape boundary. The edge become mores blurry if you increase the lerp length.
Shader toy demo https://www.shadertoy.com/view/sldyRj
[+] [-] johndough|3 years ago|reply
If you have highly detailed characters like Chinese or emojis, you need larger resolution to faithfully represent every detail. The problem is that SDFs are sampled uniformly over the pixel grid. If the character is locally complex, a high resolution is required to display it, but if the character has simple flat regions, memory is wasted. One way to get around excessive memory requirements is to store the characters in their default vector forms and only render the subset of required characters on demand, but then you might as well render them at the required pixel resolution and do away with the additional complexity of SDF rendering.
SDFs are still useful though if you have to render graphics at many different resolutions, for example on signs in computer games, as seen in the original Valve paper https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007...
[+] [-] Guzba|3 years ago|reply
[+] [-] jstimpfle|3 years ago|reply
I would like to know why you think the described approach is silly? It doesn't involve a final rasterization but merely a prefiltering of segments.
[+] [-] tayistay|3 years ago|reply
[+] [-] tayistay|3 years ago|reply
https://github.com/audulus/vger
and a rust version:
https://github.com/audulus/vger-rs
(which powers my rust GUI library: https://github.com/audulus/rui)
Here's the approach for rendering path fills. From the readme:
> The bezier path fill case is somewhat original. To avoid having to solve quadratic equations (which has numerical issues), the fragment function uses a sort-of reverse Loop-Blinn. To determine if a point is inside or outside, vger tests against the lines formed between the endpoints of each bezier curve, flipping inside/outside for each intersection with a +x ray from the point. Then vger tests the point against the area between the bezier segment and the line, flipping inside/outside again if inside. This avoids the pre-computation of Loop-Blinn, and the AA issues of Kokojima.
It works pretty well, and doesn't require as much preprocessing as the code in the article. Also doesn't require any GPU compute (though I do use GPU compute for some things). I think ultimately the approach in the article (essentially Piet-metal, aka tessellating and binning into tiles) will deliver better performance, and support more primitives, but at greater implementation complexity. I've tried the Piet-metal approach myself and it's tricky! I like the simpler Shadertoy/SDF inspired approach :)
[+] [-] moonchild|3 years ago|reply
[+] [-] slabity|3 years ago|reply
> Despite vector graphics being used in every computer with a screen connected to it, rendering of vector shapes and text is still mostly a task for the CPU.
Do modern vector libraries really not use the GPU? One of the very first things I did when learning Vulkan was to use a fragment shader to draw a circle inside a square polygon. I always assumed that we've been using the GPU for pretty much any sort of vector rasterization, whether it was bezier curves or font rendering.
[+] [-] Guzba|3 years ago|reply
There's way more to this than drawing circles and rectangles, and these hard cases are why much of path / vector graphics filling still ends up being better on CPU where you can accumulate, sort, etc which takes a lot of the work away. CPU does basically per-Y whereas this is GPU per-pixel so perhaps they're almost equal if the GPU has the square of a CPU power. Obv this isn't quite right but gives you an idea.
Video discussing path filling on CPU (super sampling and trapezoid): https://youtu.be/Did21OYIrGI?t=318 We don't talk about the complex cases but this at least may help explain the simple stuff on CPU for those curious.
[+] [-] Jasper_|3 years ago|reply
[+] [-] dragontamer|3 years ago|reply
2d vector graphics include things like "bones" and "tweening", which are CPU algorithms. (Much like how bone processing in 3d world is also CPU-side processing).
---------
Consider the creation of a Beizer curve, in 2d or 3d. Do you expect this to be a CPU algorithm, or GPU algorithm? Answer: clearly a CPU algorithm.
GPU algorithms generally are triangle-only, or close to it (ex: quads) as far as geometry. Sure, there are geometry shaders, but I don't think its common practice to take a Beizer Curve definition and write a Tesselator-shader for it and output (in parallel) a set of verticies. (And if someone is doing that, I'm interested in heading / learning more about it. It seems like a parallelizable algorithm to me but the devil is always in the details...).
[+] [-] ygra|3 years ago|reply
[+] [-] TazeTSchnitzel|3 years ago|reply
[+] [-] amelius|3 years ago|reply
[+] [-] pbsurf|3 years ago|reply
[+] [-] coffeeaddict1|3 years ago|reply
[+] [-] samlittlewood|3 years ago|reply
[+] [-] tayistay|3 years ago|reply
I do something a bit like slug, but I'm sure slower, since slug is very optimized. (https://github.com/audulus/vger)
[+] [-] bane|3 years ago|reply
https://youtu.be/-ZxPhDC-r3w
[+] [-] grishka|3 years ago|reply
[+] [-] Jasper_|3 years ago|reply
Also, typical GPU triangle antialiasing like MSAAx16 only gives you 16 sample levels, which is far from the quality we want out of fonts and 2D shapes. We don't have textures inside the triangles in 2D like we do in 3D, so the quality of the silhouette matters far more.
That said, this is what Direct2D does for everything except text.
[+] [-] bsder|3 years ago|reply
His latest paper is about how to handle stroking of cubic splines: https://arxiv.org/abs/2007.00308
He gives it as a talk, but you have to sign up with NVIDIA: https://developer.nvidia.com/siggraph/2020/video/sig03-vid
[+] [-] jakearmitage|3 years ago|reply
[+] [-] tasty_freeze|3 years ago|reply
At the time I looked at an nvidia rendering extension, which was described in this 2012 paper:
https://developer.nvidia.com/gpu-accelerated-path-rendering
In addition to the paper, the linked page has links to a number of youtube demos. That was 10 years ago, so I have no idea if that is still a good way to do it or if it has been superseded.
[+] [-] UncleEntity|3 years ago|reply
I played with it a bit, wrote a python wrapper for it, borked a fedora install trying to get real gpu support, fun times all around. Seems nobody cares about an accelerated vector graphics library.
[+] [-] genpfault|3 years ago|reply
Mesa removed support in 2015:
https://docs.mesa3d.org/relnotes/10.6.0.html
> Removed OpenVG support.
[+] [-] jfmc|3 years ago|reply
[+] [-] jahewson|3 years ago|reply
[+] [-] pier25|3 years ago|reply
Same with the FF Rust renderer (sorry don't remember the name).
[+] [-] bXVsbGVy|3 years ago|reply
[+] [-] Pulcinella|3 years ago|reply
Has anyone done any image comparisons between CPU vs GPU rendering. I would be worried about potential quality and rendering issues of a GPU rendered image vs a CPU rendered reference image.
[+] [-] moonchild|3 years ago|reply
[+] [-] TobTobXX|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] jstimpfle|3 years ago|reply
[+] [-] moonchild|3 years ago|reply
Thread groups are generally rectangular IME--nv is 8x4, others 8x8. So it doesn't make sense to distinguish X from Y in this respect. But yes, you do want a strategy for dealing with 'branch mispredictions'. Buffering works, and is applicable to the cpu too.
[+] [-] jayd16|3 years ago|reply
Especially when you consider what tile based renderers do for determining whether a triangle fully covers a tile (allowing rejection of any other draw onto that tile) it seems like GPUs could have built in support for 'inside a path or outside a path.' Even just approximating with triangles as a pre-pass seems faster than the row based method in the post.
Are arbitrary paths just too complex for this kind of optimization?
[+] [-] pyrolistical|3 years ago|reply
From my understanding there is no closed form solution to arbitrary paths defined in that way. So the only way to figure out what the shape looks like, and to figure out if a point is inside or outside, you would need to run all the commands that form the path.
[+] [-] interroboink|3 years ago|reply
But the row-based method in the post is not what they describe doing on the GPU version of the algorithm. The row-based method is their initial CPU-style version.
The GPU version handles each pixel in isolation, checking it against the relevant shape(s).
At least, if I understand things correctly (:
As far as I can tell, the approach described here is probably similar to what a built-in "draw path" command would do. Checking if something is inside a triangle is just extremely easy (no concavity, for instance) and common, and more complex operations are left up to shader developers — why burn that special-case stuff into silicon?
[+] [-] moonchild|3 years ago|reply
[+] [-] jstimpfle|3 years ago|reply
[+] [-] xawsi|3 years ago|reply
[+] [-] stefanfisk|3 years ago|reply
I could certainly be wrong though.
[+] [-] rnantes|3 years ago|reply
[+] [-] flakiness|3 years ago|reply
[+] [-] jszymborski|3 years ago|reply