top | item 39419195

SPAD: Spatially Aware Multiview Diffusers

127 points| PaulHoule | 2 years ago |yashkant.github.io | reply

34 comments

order
[+] fxtentacle|2 years ago|reply
Reducing geometric detail while keeping outlines intact is one of the major showstoppers that prevent current game engines from having realistic foliage. And that exact same problem is also why a Nerf with its near-infinite geometric detail is impractical to use for games. And this paper is yet another way to produce a Nerf.

SpeedTree already used billboard textures 10 years ago and that's still the way to go if you need a forest in UE5. Fortnite did slightly improve upon that by having multiple billboard textures that get swapped based on viewing angle, and they call that impostors. But the core issue of how to reduce overdraw and poly count when starting with a high detail object is still unsolved.

That's also the reason, BTW, why UE5's Nanite is used only for mostly solid objects like rocks and statues, but not for trees.

But until this is solved, you always need a technical artist to make a low poly mesh onto whose textures you can bake your high resolution mesh.

[+] jsheard|2 years ago|reply
Nanite can actually do trees now, and Fortnite is using it in production, with fully modelled leaves rather than cutout textures because that turned out to be more efficient under Nanite. They talk about it here: https://www.unrealengine.com/en-US/tech-blog/bringing-nanite...

That's still ultimately triangle meshes though, not some other weird representation like NERF, or distance fields, or voxels, or any of the other supposed triangle-killers that didn't stick. Triangles are proving very difficult to kill.

[+] iandanforth|2 years ago|reply
Please note that these results were obtained using a small amount of compute (compared to say a large language model training run) on a limited training set. Nothing in the paper makes me think that this won't scale. I wouldn't be surprised to see a AAA quality version of this within a few months.
[+] whimsicalism|2 years ago|reply
No major comment other than this tech is obviously going to transform gaming.
[+] Etherlord87|2 years ago|reply
to my understanding it produces 2D images (from various angles), not 3D models… But sure, it's very close to producing a 3D model.
[+] Teknomancer|2 years ago|reply
My immediate thoughts as well, but I wonder about how exactly this could be implemented in current game build chains such as unreal or unity?
[+] bugglebeetle|2 years ago|reply
I’m confused why there is so much focus on text to images and models. If you spent five minutes talking to anyone with artistic ability, they would tell you that this is not how they generate their work. Making images involves entirely different parts of reasoning than that for speech and language. We seem to be building an entirely faulty model of image generation (outside of things like ControlNet) on the premise that text and images are equivalent, solely because that’s the training data we have.
[+] deepnet|2 years ago|reply
Can you share some of what you have found about the creative process by talking to people with artistic ability ?

What are your ideas about the differences between a human and AI's creative process ?

Are there any similarities, or analagous processes ?

Do you think creators have an kind of latent space where different concepts are inspired by multi-modal inputs ( what sparks inspiration ? e.g. sometimes music or a mood inspires a picture ) and then the creators make different versions of their idea by combining different amounts of different concepts ?

I am not being snarky, I am genuinely interested in views comparing human an AI's creative processes.

[+] ummonk|2 years ago|reply
Project briefs to an artist typically contain both text and reference images. Image diffusion models and the like likewise typically use a text prompt together with optional reference images.
[+] refulgentis|2 years ago|reply
Not even wrong, in the Pauli sense: to engage requires ceding the incorrect premises that image models only accept text as input and that the generation process relies on this text
[+] astrange|2 years ago|reply
Text prompts aren't an essential part of this technology. They're being used as the interface to generation APIs because it's easy to build, easy to moderate, and for the discord models like Midjourney it's easy for people to copy your work.

With a local model you can find latent space coordinates any way you want and patch the pixel generation model any way you want too. (the above are usually called textual inversion and LoRAs.)

I would personally like to see a system that can input and output layers instead of a single combined image.

[+] teaearlgraycold|2 years ago|reply
It’s good for stock images.

And for in-painting I think you’ll find text-to-image is still useful to artists. It’s extra metadata to guide the generation of a small portion of the final image.

[+] nobut8|2 years ago|reply
Not sure what these cars are all about. Everyone travels by horse and buggy…

We’re building a model optimized for the machine, not people.

Artists can go collect clay to sculpt and flowers to convert to paint. Computers are their own context and should not be romantically anthropomorphized

In the same way fewer and fewer people go to church, fewer and fewer will see the nostalgia in being a data entry worker all day. Society didn’t stop when we all got our first beige box.

[+] ilkke|2 years ago|reply
Check out invoke.ai for an example of something much closer to a professional tool.
[+] gruturo|2 years ago|reply
or Single Photon Avalanche Diode, coming to a LIDAR near you very soon if not already.

Yay ambiguous acronyms.

[+] spacebacon|2 years ago|reply
Fortsense FL6031 - Automotive ready. For anyone not familiar with SPAD (Single Photon Avalanche Diode) YouTube it. Very impressive computational imagery through walls, around corners and such.
[+] JasonFruit|2 years ago|reply
Société Pour L'Aviation et ses Dérivés