FOVO: A new 3D rendering technique based on human vision

[+] visualphoenix|5 years ago|reply

I tried looking up a technical publication on this technique but the journal article is straight trash [0]. That said, I found a useful post where folks were trying to explore this style of rendering [1]. It’s interesting to see that in the demo videos the cameras move in a straight path. Apparently there is some wonkiness that comes from changing the view orientation.

From [1], I’ll leave by cross posting some useful links: experimental rendering techniques with Quake [2], Minecraft [3], and a cool article about visualizing projections [4].

[0] https://www.scienceopen.com/document_file/4e8c9d47-51f4-4145...

[1] https://forums.chaosgroup.com/forum/chaos-common/chaos-commo...

[2] https://github.com/shaunlebron/blinky

[3] https://github.com/shaunlebron/flex-fov

[4] http://shaunlebron.github.io/visualizing-projections/

[+] rsp1984|5 years ago|reply

Most comments here point out that this may be just a fish-eye camera model with some fancy sales language around it. This was my initial thought as well, however, it should be pointed out that changing from linear to fish-eye doesn't cause any changes in scene occlusion. It's just moving pixels on a 2D plane. The technique described however does change occlusion.

Based on the examples on the page, here's how I think it actually works: The projection center is not a single point in space but rather varies along the optical axis depending on the position of the 3D vertex as seen from the camera. More specifically: For each vertex a preliminary linear projection is computed to get an XY position, then the projection is recomputed using the preliminary XY position, where things near the center are re-projected using a projection center that is moved forward along the opt. axis and things near the borders/corners of the image have it moved backward. On top of this then probably a standard fish-eye.

That said, I'm not sure this whole projection trickery is really worth it, given that it comes with its own issues (your scene will appear to warp when you turn the camera) and given that you can probably get 90% of the value with one of the more standard fish-eye models.

[+] TeMPOraL|5 years ago|reply

> That said, I'm not sure this whole projection trickery is really worth it

I think it might be. Look at the first GIF in the article, the one showing the room. Try to imagine yourself standing in that room. Doesn't the FOVO-corrected picture look pretty much like what you'd see with your own eyes if you were the camera? It definitely looks like that for me.

Using standard projections for rendering, there's no FOV value that would produce this effect - at a typical videogame FOV of 90°, you see only a fragment of what your eyes would perceive, and as you increase FOV, distortions become really bad.

I went into more details on this here: https://news.ycombinator.com/item?id=26805010.

[+] onhn|5 years ago|reply

It is interesting that they chose to use a first person shooter as an example screenshot since gameplay would be broken by visual changes in occlusion.

[+] amelius|5 years ago|reply

> your scene will appear to warp when you turn the camera

But only near the edges, I suppose?

Human vision is not very detailed near the edges of the field of view, so perhaps a reproduction of that vision shouldn't be either?

[+] klodolph|5 years ago|reply

Your scene will appear to warp when you turn the camera anyway, unless you put your eye at the one point in space where it matches the projection. Few people put their eyes so close to the screen.

[+] a_e_k|5 years ago|reply

This looks an awful lot like a variation of the Panini projection described in this paper [1], which I implemented for RenderMan [2].

The tell-tale sign to me is that the perspective lines that converge towards the vanishing point remain straight, while the other lines bow. You can see it pretty clearly in the last two pictures on their archvis page [3].

[1] http://tksharpless.net/vedutismo/Pannini/panini.pdf

[2] https://rmanwiki.pixar.com/display/REN23/PxrPanini

[3] https://www.fovotec.com/architectural

[+] vkoskiv|5 years ago|reply

> which I implemented for RenderMan

Cool, you work at Pixar? I've been working on a small hobby renderer [1] for a few years, and RenderMan has been a big inspiration!

[1] https://github.com/vkoskiv/c-ray

[+] mkl|5 years ago|reply

It seems like it's supposed to be "Pannini", after a painter (https://en.wikipedia.org/wiki/Image_stitching#Pannini), and that's what the original paper says.

But when the lead author spells it both ways in the same URL, and many software implementations use "Panini", it gets a bit confusing!

[+] rsp1984|5 years ago|reply

Pannini Projection doesn't change occlusions though as described in the article.

[+] virtualritz|5 years ago|reply

My BS-o-meter went red when I read this:

> This demonstrates that the FOVO process is not simply ‘warping’ a 2D render in screen space but is applying nonlinear transformations to the entire 3D geometry.

This seems to conflate an implementation detail with what what is actually happening here.

It doesn't matter if what you do is a some non-linear 2D projection applied to 3D vertices or 2D pixels (or 2D projected vertices). The 'light rays' we concern ourselves with don't bend around. I.e. this is just another non-linear projection.

All this sounds to me like fluff to bolster a business/patent application for what is essentially a fancy fisheye transform.

[+] Grustaf|5 years ago|reply

I think the point is not to talk about how it's implemented, but to point out that it's a more complex change, one that cannot be achieved by first performing a traditional render and then transforming the resulting 2d image. You actually need to project differently.

[+] rubatuga|5 years ago|reply

Yeah, it’s BS. There are a bunch of fisheye transforms, including Stereographic, equidistant, and equisolid angle.

[+] steerablesafe|5 years ago|reply

"emulating human vision" is a distraction here. If you have a flat 22 inch 16:10 display that you view from a 1m distance, dead centre, then the most natural way to render is to use linear projection with a FOV of ~26 degrees, which is absolutely tiny. But this matches the same view as if you were looking out a window the same size of your display.

Since such a tiny FOV is impractical, any rendering that wants to cover more FOV has to distort the view some way. There will be regions on the screen with more or less distortions, this becomes a trade-off. Arguably linear projection with a large FOV does a poor job at this.

I also wonder if there are projection techniques that specifically optimize for curved screens.

[+] Grustaf|5 years ago|reply

Pretending that the screen i just a window into the real world might be the most technically correct way to do it, but that doesn't mean it's the one we will experience as most real or most engaging.

[+] TeMPOraL|5 years ago|reply

> "emulating human vision" is a distraction here.

Is it though? I read your comment first and agreed with what you wrote, but then I saw the GIF at the top of the article and instantly got what they're doing. They're solving what I call the problem of "taking a photo of what you see".

(I don't know what the professional vocabulary is for this, so please excuse the long-winded explanation.)

Here's the scenario: when I sit on the chair and look straight forward without moving, I can see a certain area of the room in a certain way. If I take out my phone, place it where my eyes are and take a photo, the resulting image looks nothing like what I just saw from that position. The biggest difference is, it captures much less of the field of view, but also the angles look wrong.

I also repeated this "experiment" virtually. The other day, I was doing some interior design in Blender, and created a room and a corridor using real dimensions. I tried, out of curiosity, to render a view from the perspective of myself (using my eye height). I explored camera configurations for a good hour, and couldn't find anything that would render the image corresponding to how I would perceive that space in real life. Again, the main problem was, either the field of view was too small, or angles were too badly distorted.

Of course a flat image of a scene, viewed from distance, is not the same as seeing the scene itself. Both my phone and Blender are correct. Perhaps they reproduce exactly what I'd see if my phone and screen were windows to the scene. But that's not what I'm after, that's not how my brain tries to interpret these images[0]! Instead, my mind tries to comprehend the image as if my eyes were the camera. It does some weird and hard to describe mapping[1], trying to put itself in the place of the camera, and immediately notices that the image doesn't look like what eyes would perceive.

When I saw that GIF showing a room rendered with hFOV=160°, the FOVO variant immediately clicked. My brain took that image, did its weird mapping, and concluded that this is exactly how the room would look like to me, if I was standing where the camera is!

So in my books, whatever they're doing, it's a big step in the right direction. These corrected renders feel like what I'd see if I were the camera. For games, this could improve immersion when playing on a flat screen. Outside of games, such transform on a photo would help people communicate their perception of the things they photograph.

--

[0] - In this case. I have no problem seeing a picture as a picture, but I can also try to see it as recording of perception. It's a choice I can make. I assume that's pretty standard and everyone can do this if the image looks close enough to first-person view.

[1] - The best description I can come up with - and mind you, I'm going off my feelings and introspection - is as follows. Assume a simplified model of sight as made of 3 layers. In layer 1, you have raw input signals from the eyes, which give you two tiny images. Layer 2 expands these images into an unified, much larger image, correcting for saccades and filling in blanks via memory and inference. Then layer 3 tries to understand the image.

What my brain seems to do, when I'm trying to comprehend a first person perspective render or a photo, is identifying that artificial picture, isolating it and spawning a subprocess running just layer 3, feeding that artificial picture as its input, as if it came from the layer 2.

[+] doncarbon|5 years ago|reply

Hmm, sounds like a lot of sales talk around what seems to be a "nice fisheye effect".

The first gif of that room in the article shows exactly the problem: rasterizing works by transforming a bunch of triangles from world space to view space.

This works with rectilinear projection because each triangle can be transformed into a different shaped triangle, based on the perspective of the camera. You can't transform a triangle into a "bendy" triangle. And the screenshots show "bendy" lines.

So, if you want a "fisheye" effect in rasterization, you first need to render a rectilinear image, and then distort it, leaving you with a blurry center due to a lack of resolution.

[+] eurekin|5 years ago|reply

I only skimmed the article, but knowing previous solutions, there were attempts to use real subdivision and displacement of the original geometry.

The article does mention this line:

> These subtle adjustments of the 3D space are being applied volumetrically, as can be seen from the way the occlusion paths in the scene are changing. This demonstrates that the FOVO process is not simply ‘warping’ a 2D render in screen space but is applying nonlinear transformations to the entire 3D geometry.

I'm not sure how to interpret this in real terms, but it does seem to suggest at least it might be more than a simple 2d transform

[+] PythagoRascal|5 years ago|reply

I mean, interesting article and all, but I wish they would have used labeled side by side comparison for the renders instead of a GIF. It's tedious to figure out which is which like this.

[+] alanbernstein|5 years ago|reply

I disagree. Switching between two images in the same space is one of the best ways to perceive differences. Another good way is with a spider, which is used at https://www.fovotec.com/architectural

[+] sxp|5 years ago|reply

https://www.fovotec.com/architectural has a more useful demo than the basic GIFs on that page.

[+] divan|5 years ago|reply

Thank you! I was struggling to understand which of the pictures in those GIFs is standard and which is FOVO. This demo is super clear.

[+] teraflop|5 years ago|reply

> The result is FOVO, a new method of rendering 3D space that emulates the human visual system, not cameras.

I don't buy this part. Sure, the human visual system does all kinds of processing that causes us to perceive our surroundings differently than we do a flat image on a screen. But the input to the visual system is still an image on your retina, which obeys the same laws as camera optics.

If you ignore depth of field (which the authors don't seem to be concerned with) then the human eye behaves essentially like a pinhole camera. No matter what happens behind the pinhole, all of the light rays entering the eye must pass through the pinhole from the environment, traveling in straight lines. For any given viewpoint, the relationships of which objects are occluded by which other objects will be exactly the same for an eye as for a camera. But the authors specifically point out that their algorithm doesn't preserve these relationships.

Of course, computer graphics is as much an art as a science. If deviating from the realistic model turns out to give aesthetically pleasing results, then by all means go for it. But the reason it's better wouldn't have anything to do with more closely mimicking what the human eye actually perceives.

In any case, I would question the aesthetic benefits. To my eye, the algorithm seems to distort relative shapes and sizes of objects in a weird way. It looks great in screenshots, but when moving through a scene[1], it creates a subtly unsettling "space-warping" effect.

I also think the comparisons in the article are a little disingenuous, because everyone knows that linear perspective projections with wide FOV look horrible. I'd like to see a comparison against something else, like a stereographic or fisheye projection, which would be both more physically realistic and more efficient to render.

[1]: https://www.fovotec.com/architectural

[+] SamBam|5 years ago|reply

> No matter what happens behind the pinhole, all of the light rays entering the eye must pass through the pinhole from the environment, traveling in straight lines. For any given viewpoint, the relationships of which objects are occluded by which other objects will be exactly the same for an eye as for a camera.

This is too simplified.

1. The light falls onto a curved surface of the retina, not a flat screen behind it, but more importantly

2. Our brains interpret the light that falls and creates the experience of a 3D space in front of us.

If you actually simply flattened out the retina and mapped out, point to point, the light that falls, it would look a bit like their linear perspective example. The center would be clear, and the edges would be horribly stretched. And yet that's nothing like the way we perceive our vision. Looking at the image on the back of our retina rolled out like a painting, we wouldn't recognize it at all as what we see.

This is attempting to create a flat image that looks like what we see in front of us when we're standing in a space.

[+] Gauge_Irrahphe|5 years ago|reply

>But the input to the visual system is still an image on your retina, which obeys the same laws as camera optics.

The thing that makes the difference is that the resolution of the retina drops as you move off the center. Which I suppose is what is being simulated here. Or at least it could be efficiently simulated that way - like foveated rendering, only the fovea is kept at the center and the rest of the image is kept with the pixels smashed closer together instead of interpolated.

[+] tomc1985|5 years ago|reply

How different is this from a subtle fisheye lens effect? I feel like I saw such a thing (Fisheye projection) rendered in the original Quake years ago and it looks very similar to the outputs in the article

edit: http://strlen.com/gfxengine/fisheyequake/

[+] rafaelvasco|5 years ago|reply

It looks nothing like it. FOVO looks perfectly natural to me. Fish eye doesn't. I'm baffled by the comments on this post. This is not a fisheye effect at all. It's visually clear to me.

[+] crazygringo|5 years ago|reply

Unfortunately this article doesn't explain what FOVO actually does differently, just that it's not normal linear perspective.

I'm guessing what it's doing is the effect you get from when you take a panorama photo with your phone, so each column in the image is rendered as if in the center of a lens rather than at a side -- thus producing the curved lines we associate with panoramas, and that seem to be in these photos.

But that's not new, so what did they supposedly invent? Is it actually some new kind of projection that isn't the same as how panorama photos work? Or is it just that, but nobody had built it for 3D rendering software before and they did? Or people had, but they made it work in real-time?

[+] jml7c5|5 years ago|reply

It looks like it's essentially just a tessellation + vertex shader run on the entire scene. Say you have a lamp-post mesh in your scene. Rather than drawing polygons with straight lines (normal), or drawing polygons with curved lines (difficult to accelerate on GPUs), you deform the lamp-post mesh to give the appearance of a curved image. The obvious issue is that you have to massively increase the polygon count to create a smooth surface.

If this is the case, I don't think it's a new idea.

It's unclear to me how the fovea enters into it.

[+] chrispine|5 years ago|reply

Can I see a video of turning/moving through a game with this? A static image isn't enough for me to tell how this would actually feel.

[+] dayze|5 years ago|reply

https://www.fovotec.com/architectural has an animation, and it looks pretty great - somehow feels more like actually 'walking' into that space.

[+] d--b|5 years ago|reply

Maybe it's sales BS as others have commented. Maybe it's just a fisheye effect. But I sure wish that this kind of projections was the default for video games...

[+] djmips|5 years ago|reply

Particularly if you have a very wide screen or multi monitor widescreen setup.

[+] zokier|5 years ago|reply

Going through the patents from the author[1] is probably the best technical resource about their secret sauce. In particular US9684946B2[2] seems to be the foundational patent about the tech.

As others mentioned already, Vedutismo/Pannini projections seem closely related here, but they might be doing some extra on top of that.

[1] https://patents.google.com/?inventor=Robert+Christian+PEPPER...

[2] https://patents.google.com/patent/US9684946B2/en

[+] Asooka|5 years ago|reply

This looks a lot like the quincuncial projection applied to a cube map render, with some custom logic to subtly warp geometry in a stable way so as to make certain details pop based on distance. You can try for yourself by downloading blinky (formerly fisheye quake) [1] and experimenting with the projections. That said, it's probably not just a projected cube map. Doing so will introduce noticeable sampling artifacts, so I am guessing they also do something more to address such issues.

[1] https://github.com/shaunlebron/blinky

[+] Synaesthesia|5 years ago|reply

Wow I checked that out and it's pretty close indeed. The Quake game is pretty nausea inducing but I think that's a side effect of the extreme FOVs (180+ degrees).

[+] Daub|5 years ago|reply

Of relevance, a wonderful study on the limits of human vision by NASA;

https://ntrs.nasa.gov/citations/19730006364

This image is a visual summary... all you need to know about the ‘vision frame’. The grey area is where the limits of vision and the limits of perception meet. Simply put: we can see things we can’t perceive.

https://craftofcoding.files.wordpress.com/2019/05/nasavisual...

I use this when discussing composition with design students. The problem, design is rectangle-based, human vision is not.

I became interested in this topic when I encountered a student of mine who, due to a neurological deficiency, could see things out of one eye but not understand it.

[+] etaioinshrdlu|5 years ago|reply

This is rather fascinating. I am shocked that such a relatively simple rendering tweak hasn't been widely used in games and movies to date.

I'm guessing that the math is not describable using a transformation matrix. It might have been too heavy for very old graphics hardware, but any GPU from the last 15 years should be able to do this kind of thing no problem.

Here's a minecraft video where you can see the effect in real-time as well as pushing the field of view too far: https://www.youtube.com/watch?v=ZNSW6Ga7Wmo

[+] fxtentacle|5 years ago|reply

I believe most of this can be emulated nicely by using lens ST maps to assign custom camera ray directions for each pixel.

Their demo seems to also offset the camera ray source a bit, so one would need another ST map for that. But all in all, this looks like someone could code a V-Ray plugin to implement it in a weekend.

I'd be truly interested to learn how this works out for them, business-wise. I don't think they can patent lens distortion and I don't think the implementation is difficult enough to protect against copycats. So how do they win the competition?

[+] splittingTimes|5 years ago|reply

Is my understanding correct, that this is not for 3D view of single models (like in CAD applications), but rather geared towards landscape views with high view angles >90° (in games most notably)?

[+] fxtentacle|5 years ago|reply

Yes, they are compensating for the distortion produced by wide-angle lenses and if you don't use a wide rendering angle, e.g. in a CAD software, the effect will be minimal.

[+] nayuki|5 years ago|reply

> These subtle adjustments of the 3D space are being applied volumetrically, as can be seen from the way the occlusion paths in the scene are changing. This demonstrates that the FOVO process is not simply ‘warping’ a 2D render in screen space but is applying nonlinear transformations to the entire 3D geometry.

I think this is an inexact hack to implement FOVO in rasterization engines. If raytracing were used, there would be no need to distort the geometry; you'd only need to distort the camera rays.

[+] sp332|5 years ago|reply

It changes the occlusion paths too, so you'd also need to change the position the rays are sent from to get the same effect.

83 comments