(no title)
tel | 1 month ago
For example, the camera orbits around the performers in this music video are difficult to imagine in real space. Even if you could pull it off using robotic motion control arms, it would require that the entire choreography is fixed in place before filming. This video clearly takes advantage of being able to direct whatever camera motion the artist wanted in the 3d virtual space of the final composed scene.
To do this, the representation needs to estimate the radiance field, i.e. the amount and color of light visible at every point in your 3d volume, viewed from every angle. It's not possible to do this at high resolution by breaking that space up into voxels, those scale badly, O(n^3). You could attempt to guess at some mesh geometry and paint textures on to it compatible with the camera views, but that's difficult to automate.
Gaussian splatting estimates these radiance fields by assuming that the radiance is build from millions of fuzzy, colored balls positioned, stretched, and rotated in space. These are the Gaussian splats.
Once you have that representation, constructing a novel camera angle is as simple as positioning and angling your virtual camera and then recording the colors and positions of all the splats that are visible.
It turns out that this approach is pretty amenable to techniques similar to modern deep learning. You basically train the positions/shapes/rotations of the splats via gradient descent. It's mostly been explored in research labs but lately production-oriented tools have been built for popular 3d motion graphics tools like Houdini, making it more available.
pleurotus|1 month ago
tel|1 month ago
https://www.realsenseai.com/products/real-sense-depth-camera...
That said, I don't think splats:voxels as pixels:vector graphics. Maybe a closer analogy would be pixels:vectors is the same as voxels:3d mesh modeling. You might imagine a sophisticated animated character being created and then animated using motion capture techniques.
But notice where these things fall apart, too. SVG shines when it's not just estimating the true form, but literally is it (fonts, simplified graphics made from simple strokes). If you try to estimate a photo using SVG it tends to get messy. Similar problems arise when reconstructing a 3d mesh from real-world data.
I agree that splats are a bit like pixels, though. They're samples of color and light in 3d (2d) space. They represent the source more faithfully when they're more densely sampled.
The difference is that a splat is sampled irregularly, just where it's needed within the scene. That makes it more efficient at representing most useful 3d scenes (i.e., ones where there are a few subjects and objects in mostly empty space). It just uses data where that data has an impact.
MITSardine|1 month ago
corysama|1 month ago
It works well for what it does. But, it's mostly only effective for opaque, diffuse, solid surfaces. It can't handle transparency, reflection or "fuzz". Capturing material response is possible, but requires expensive setups.
A scene like this poodle https://superspl.at/view?id=6d4b84d3 or this bee https://superspl.at/view?id=cf6ac78e would be pretty much impossible with photogrammetry and very difficult with manual, traditional, polygon workflows. Those are not videos. Spin them around.
dahart|1 month ago
BTW I believe there is software that can turn point clouds into textured meshes reliably; multiple techniques even, depending on what your goals are.
baxuz|1 month ago
This includes sparse areas like fences, vegetation and the likes, but more importantly any material properties like reflections, specularity, opacity, etc.
Here's a few great examples: https://superspl.at/view?id=cf6ac78e
https://superspl.at/view?id=c67edb74
cubefox|1 month ago
I would say it's a 3D photo, not a 3D video. But there are already extensions to dynamic scenes with movement.
poly2it|1 month ago