Reducing geometric detail while keeping outlines intact is one of the major showstoppers that prevent current game engines from having realistic foliage. And that exact same problem is also why a Nerf with its near-infinite geometric detail is impractical to use for games. And this paper is yet another way to produce a Nerf.
SpeedTree already used billboard textures 10 years ago and that's still the way to go if you need a forest in UE5. Fortnite did slightly improve upon that by having multiple billboard textures that get swapped based on viewing angle, and they call that impostors. But the core issue of how to reduce overdraw and poly count when starting with a high detail object is still unsolved.
That's also the reason, BTW, why UE5's Nanite is used only for mostly solid objects like rocks and statues, but not for trees.
But until this is solved, you always need a technical artist to make a low poly mesh onto whose textures you can bake your high resolution mesh.
Nanite can actually do trees now, and Fortnite is using it in production, with fully modelled leaves rather than cutout textures because that turned out to be more efficient under Nanite. They talk about it here: https://www.unrealengine.com/en-US/tech-blog/bringing-nanite...
That's still ultimately triangle meshes though, not some other weird representation like NERF, or distance fields, or voxels, or any of the other supposed triangle-killers that didn't stick. Triangles are proving very difficult to kill.
Please note that these results were obtained using a small amount of compute (compared to say a large language model training run) on a limited training set. Nothing in the paper makes me think that this won't scale. I wouldn't be surprised to see a AAA quality version of this within a few months.
I’m confused why there is so much focus on text to images and models. If you spent five minutes talking to anyone with artistic ability, they would tell you that this is not how they generate their work. Making images involves entirely different parts of reasoning than that for speech and language. We seem to be building an entirely faulty model of image generation (outside of things like ControlNet) on the premise that text and images are equivalent, solely because that’s the training data we have.
Can you share some of what you have found about the creative process by talking to people with artistic ability ?
What are your ideas about the differences between a human and AI's creative process ?
Are there any similarities, or analagous processes ?
Do you think creators have an kind of latent space where different concepts are inspired by multi-modal inputs ( what sparks inspiration ? e.g. sometimes music or a mood inspires a picture ) and then the creators make different versions of their idea by combining different amounts of different concepts ?
I am not being snarky, I am genuinely interested in views comparing human an AI's creative processes.
Project briefs to an artist typically contain both text and reference images. Image diffusion models and the like likewise typically use a text prompt together with optional reference images.
Not even wrong, in the Pauli sense: to engage requires ceding the incorrect premises that image models only accept text as input and that the generation process relies on this text
Text prompts aren't an essential part of this technology. They're being used as the interface to generation APIs because it's easy to build, easy to moderate, and for the discord models like Midjourney it's easy for people
to copy your work.
With a local model you can find latent space coordinates any way you want and patch the pixel generation model any way you want too. (the above are usually called textual inversion and LoRAs.)
I would personally like to see a system that can input and output layers instead of a single combined image.
And for in-painting I think you’ll find text-to-image is still useful to artists. It’s extra metadata to guide the generation of a small portion of the final image.
Not sure what these cars are all about. Everyone travels by horse and buggy…
We’re building a model optimized for the machine, not people.
Artists can go collect clay to sculpt and flowers to convert to paint. Computers are their own context and should not be romantically anthropomorphized
In the same way fewer and fewer people go to church, fewer and fewer will see the nostalgia in being a data entry worker all day. Society didn’t stop when we all got our first beige box.
Fortsense FL6031 - Automotive ready. For anyone not familiar with SPAD (Single Photon Avalanche Diode) YouTube it. Very impressive computational imagery through walls, around corners and such.
[+] [-] fxtentacle|2 years ago|reply
SpeedTree already used billboard textures 10 years ago and that's still the way to go if you need a forest in UE5. Fortnite did slightly improve upon that by having multiple billboard textures that get swapped based on viewing angle, and they call that impostors. But the core issue of how to reduce overdraw and poly count when starting with a high detail object is still unsolved.
That's also the reason, BTW, why UE5's Nanite is used only for mostly solid objects like rocks and statues, but not for trees.
But until this is solved, you always need a technical artist to make a low poly mesh onto whose textures you can bake your high resolution mesh.
[+] [-] jsheard|2 years ago|reply
That's still ultimately triangle meshes though, not some other weird representation like NERF, or distance fields, or voxels, or any of the other supposed triangle-killers that didn't stick. Triangles are proving very difficult to kill.
[+] [-] iandanforth|2 years ago|reply
[+] [-] whimsicalism|2 years ago|reply
[+] [-] Etherlord87|2 years ago|reply
[+] [-] Teknomancer|2 years ago|reply
[+] [-] bugglebeetle|2 years ago|reply
[+] [-] deepnet|2 years ago|reply
What are your ideas about the differences between a human and AI's creative process ?
Are there any similarities, or analagous processes ?
Do you think creators have an kind of latent space where different concepts are inspired by multi-modal inputs ( what sparks inspiration ? e.g. sometimes music or a mood inspires a picture ) and then the creators make different versions of their idea by combining different amounts of different concepts ?
I am not being snarky, I am genuinely interested in views comparing human an AI's creative processes.
[+] [-] ummonk|2 years ago|reply
[+] [-] refulgentis|2 years ago|reply
[+] [-] astrange|2 years ago|reply
With a local model you can find latent space coordinates any way you want and patch the pixel generation model any way you want too. (the above are usually called textual inversion and LoRAs.)
I would personally like to see a system that can input and output layers instead of a single combined image.
[+] [-] teaearlgraycold|2 years ago|reply
And for in-painting I think you’ll find text-to-image is still useful to artists. It’s extra metadata to guide the generation of a small portion of the final image.
[+] [-] nobut8|2 years ago|reply
We’re building a model optimized for the machine, not people.
Artists can go collect clay to sculpt and flowers to convert to paint. Computers are their own context and should not be romantically anthropomorphized
In the same way fewer and fewer people go to church, fewer and fewer will see the nostalgia in being a data entry worker all day. Society didn’t stop when we all got our first beige box.
[+] [-] ilkke|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] gruturo|2 years ago|reply
Yay ambiguous acronyms.
[+] [-] spacebacon|2 years ago|reply
[+] [-] JasonFruit|2 years ago|reply
[+] [-] denkmoon|2 years ago|reply