I was literally thinking about this in bed before falling asleep last night, but I was imagining scenes rather than objects.
I'm in no way a ML developer (I'm not even a developer), but I was wondering how hard it would be to generate an apartment / house layout from a (somewhat detailed) description.
I'm sure the first use that springs to mind is for architects and real estate developers / agents / brokers, but one of my pet-projects-I'll-never-really-do is to recreate memories in 3D from descriptions and old photos. Imagine being able to relive your childhood memories! Just throwing it out there in case one of you smart folks want to take that and run with it...
Really cool idea, I wonder why nobody tried this earlier.
However, the main problem I have with this approach is the voxels. They model the geometry and only the geometry. The far more important aspect of topology is left out. Thus, the results suffer the same problems like 3D scanning / photogrammetry does: It is practically unusable as it can not even be textured, yet alone animated or used in fabrication (except for Lego-models I guess). So point 5 of the future work is the biggest one in my opinion.
Voxels are directly analogous to 2d pixels, which means they benefit from the myriad of CNN based techniques developed over the last few years (GAN in particular)
3d manifolds are sparse, and more analogous to 2d vector graphics. There are approaches for dealing with this type of data (eg. spectral graph NN) but they don't work as well for 3d topology as CNNs do for dense pixel data, as far as I know.
In the near term, it might be better to explore approaches that use voxels, then generate the topology heuristically.
Electronic Arts used 3D scanning / photogrammetry in the new Star Wars Battlefront II.
You're right - the topology is a huge problem - but we're starting to see specialized tools that can take minimal input from the artist and automate the topology creation workflow.
Here's the GDC talk where the E.A devs go over how they converted the scans to game usable 3D models.
https://youtu.be/U_WaqCBp9zo
Well they have at least for certain parts. I'd guess based on what they use, the whole workflow here was really only possible in the last couple of years.
I'll update this comment another time but back in 2016 my company needed something like this and there was already research using GANs to generate objects from basic parameter inputs. The text portion wasn't there.
I believe that research came out of Stanford actually.
Ya using voxels was definitely a big drawback to my approach here but was more intended to just see what's possible.
I think a better approach would be something similar to the StructureNet paper I mention in the post and use graph based models to actually attempt to capture the topology. But they did it with super explicitly defined part trees as training data the hard part would be finding a way to do that in a unsupervised manner so you could actually make use of the massive amount of unlabelled 3D models available.
yeah, since all they wouls show were rather cruddy voxel shapes and didn't start the text with "It actually includes the physics from the start" I didn't bother read the whole thing.
But it very much looks like a very advanced way to produce something slightly less useful than a sketch on a napkin, since the latter was made with an actual understanding of what the description is supposed to mean.
Very short sighted. The underlying technology is fundamentally more powerful. Also the whole tool ecosystem has improved, so going from voxels -> surfaces is better.
Amazing! I do work in 3D computer vision and I've really wanted to get into 3D generative asset design as a side project at least. If the author or anyone else is interested in collaborating, I'd be down to talk and shop something.
Oh wow, count me in. You could use Google Cloud Speech to go from voice -> text, and then Google Cloud Natural Language to go from text -> simple command interface.
I've got both working in Python, but don't do anything with the parsed text.
I want to do things like
"create sphere"
"move that up"
"no, bigger"
Great work! I wonder if using SDF as a 3d model representation would result in higher resolution model generation (rather than voxels), some recent work in that area here: https://github.com/marian42/shapegan
Adversarial Generation of Continuous Implicit Shape Representations
: https://arxiv.org/abs/2002.00349
I hadn't seen that paper, looks really interesting! It definitely would improve the resolution but I think fundamentally it still runs into the issues I had with voxels where it mostly just learns low level correlations and not high level topology.
This is fantastic. Much better than my idea of just using text recognition to search a database of models. I hope this research continues so in the years to come I can slap on a VR headset and generate scenes like the holodeck.
It would be interesting to train the 3D Designer on more classes. Then have the text generated from something like AI Dungeon 2 feed into it and see what 3D designs it generates.
I've been thinking along similar lines on high level - this approach will likely be very general and can be used to create different kinds of media, as well as potentially behaviours in a later stage.
Importing libraries...
ModuleNotFoundError: No module named 'plotly'
Traceback:
File "/usr/local/lib/python3.7/site-packages/streamlit/ScriptRunner.py", line 314, in _run_script
exec(code, module.__dict__)
File "/app/shape/streamlit_app.py", line 15, in <module>
import plotly
[+] [-] airstrike|6 years ago|reply
I'm in no way a ML developer (I'm not even a developer), but I was wondering how hard it would be to generate an apartment / house layout from a (somewhat detailed) description.
I'm sure the first use that springs to mind is for architects and real estate developers / agents / brokers, but one of my pet-projects-I'll-never-really-do is to recreate memories in 3D from descriptions and old photos. Imagine being able to relive your childhood memories! Just throwing it out there in case one of you smart folks want to take that and run with it...
[+] [-] adfm|6 years ago|reply
Wallacei X Cybertruck: https://youtu.be/bLZf-MNRoyg
[+] [-] tarsiel|6 years ago|reply
[+] [-] rkagerer|6 years ago|reply
I really want to try the live demo and make some melty furniture of my own, but I got this error:
[+] [-] cheschire|6 years ago|reply
The scene where the crew describes their mass dream into a horror dentist chair is forever burned into my brain as a technology I want in my lifetime.
[+] [-] starstorms|6 years ago|reply
[+] [-] Lichtso|6 years ago|reply
However, the main problem I have with this approach is the voxels. They model the geometry and only the geometry. The far more important aspect of topology is left out. Thus, the results suffer the same problems like 3D scanning / photogrammetry does: It is practically unusable as it can not even be textured, yet alone animated or used in fabrication (except for Lego-models I guess). So point 5 of the future work is the biggest one in my opinion.
[+] [-] Jack000|6 years ago|reply
3d manifolds are sparse, and more analogous to 2d vector graphics. There are approaches for dealing with this type of data (eg. spectral graph NN) but they don't work as well for 3d topology as CNNs do for dense pixel data, as far as I know.
In the near term, it might be better to explore approaches that use voxels, then generate the topology heuristically.
[+] [-] abj|6 years ago|reply
You're right - the topology is a huge problem - but we're starting to see specialized tools that can take minimal input from the artist and automate the topology creation workflow.
Here's the GDC talk where the E.A devs go over how they converted the scans to game usable 3D models. https://youtu.be/U_WaqCBp9zo
[+] [-] AndrewKemendo|6 years ago|reply
I'll update this comment another time but back in 2016 my company needed something like this and there was already research using GANs to generate objects from basic parameter inputs. The text portion wasn't there.
I believe that research came out of Stanford actually.
[+] [-] starstorms|6 years ago|reply
I think a better approach would be something similar to the StructureNet paper I mention in the post and use graph based models to actually attempt to capture the topology. But they did it with super explicitly defined part trees as training data the hard part would be finding a way to do that in a unsupervised manner so you could actually make use of the massive amount of unlabelled 3D models available.
[+] [-] masto|6 years ago|reply
Or did you mean the way the model is specified?
[+] [-] sgt101|6 years ago|reply
[+] [-] SiempreViernes|6 years ago|reply
But it very much looks like a very advanced way to produce something slightly less useful than a sketch on a napkin, since the latter was made with an actual understanding of what the description is supposed to mean.
[+] [-] JabavuAdams|6 years ago|reply
[+] [-] heyitsguay|6 years ago|reply
[+] [-] gaogao|6 years ago|reply
Essentially, people in VR would be able to point and speak to create objects in a shared space.
[+] [-] JabavuAdams|6 years ago|reply
I've got both working in Python, but don't do anything with the parsed text.
I want to do things like "create sphere" "move that up" "no, bigger"
etc.
[+] [-] mendeza|6 years ago|reply
[+] [-] starstorms|6 years ago|reply
[+] [-] afpx|6 years ago|reply
[+] [-] tlack|6 years ago|reply
(I couldn't find it in the linked article, might have missed it)
[+] [-] raidicy|6 years ago|reply
[+] [-] whichquestion|6 years ago|reply
[+] [-] Agebor|6 years ago|reply
https://metapresent.org/creation-engine
Will be interesting to base it on a decentralised open platform that could be "built-in" in the Internet.
[+] [-] bufferoverflow|6 years ago|reply
[+] [-] noajshu|6 years ago|reply
[+] [-] ebg13|6 years ago|reply
[+] [-] hyperion2010|6 years ago|reply
Flat surface with bevelled edge and 4 Ionic columns as legs that fade into nymphs at the base.
Table with caryatid legs.
Crescent wrench that doubles as a corkscrew.
Stapleremover.
[+] [-] abhinai|6 years ago|reply
[+] [-] vsskanth|6 years ago|reply
"sofa with 3 cushions and a round arm" should generate a model and match with similar looking products
[+] [-] systemvoltage|6 years ago|reply
[+] [-] ExSoax|6 years ago|reply
[+] [-] qayxc|6 years ago|reply
[1] https://t2i.cvalenzuelab.com
[2] https://towardsdatascience.com/text-to-image-a3b201b003ae
[+] [-] britmob|6 years ago|reply
[+] [-] camillovisini|6 years ago|reply