top | item 39437948

GALA3D: Towards Text-to-3D Complex Scene Generation

78 points| jfoster | 2 years ago |gala3d.github.io | reply

28 comments

order
[+] bbor|2 years ago|reply
The media is claiming we’re in an AI Cold War with China, yet here’s a sinoamerican paper casually talking about automating the simple task of the entirety of human spatial understanding. Maybe we can all get along, after all?

EDIT: is this paper… broken…? for anyone else? I cant read past the first two pages on phone or Mac, haven’t tried Linux. https://arxiv.org/pdf/2402.07207.pdf

[+] ben_w|2 years ago|reply
Individuals have an easier time cooperating than their respective governments.
[+] mihaaly|2 years ago|reply
Am I the only one reluctant getting into lengthy discussions with a computer about making 3D models (or any other creation activity in fact) instead of doing it? Especially when I have specific ideas about certain details not that easy to put into words that both me and the receiving end (in this case computer) have the exact same (potentially cultural) understanding about. These panda and things are fun early demos for play but how far is this and more importantly, how far can it be reliable? How useful this can be eventually for serious matter where it really counts to be used? Perhaps there is a reason that practically no real engineering disciplines (work with 3D models professionally) rely on anecdotes and conversations, storytelling, going to the place of manufacture or construction telling with pretty words how to form the product but rely on technical drawings following specific rules that limit what and how is told and what people involved learn to make and read throughout years of training. If casual conversation was enough perhaps other than art and entertainment matters was not having their restricted language either (on top of special formatted documents).

Make me a house... no, not that tall... a bit taller now... good... and a family home only... also it is ugly, make it look like in England... no, not modern England but old....not that old, from after Victorian era... ah, not the ones made for coal miners, those are ugly, make it detached and something a middle class family in Surrey would like... no, that's too big ... put it on the shelf darling... hey, I did not talk to you, do not put the house on a shelf! ... good, but I do not have that much land, make it narrow... and have American kitchen in it... no, not with American stove but an induction one... and do not start the stairs right at the entrance that is unpractical... have the bathroom in the ground floor... I don't care if it is not a proper English home anymore just do what I tell.... but not exactly how I tell but do it the way I want!....

[+] amarant|2 years ago|reply
I think the best we can hope for, at least for the foreseeable future, is a workflow where you ask the AI to do a thing it gets it 80% right (someone who's good at writing prompts might even have it get a bit more correct) and then manually correct the things that it didn't get exactly like we wanted them.

I think we're quite far from replacing skilled professionals entirely, but making them a lot more productive is within reach!

[+] yreg|2 years ago|reply
> These panda and things are fun early demos for play but how far is this and more importantly, how far can it be reliable? How useful this can be eventually for serious matter where it really counts to be used?

This reminds me of the thoughts the industry had on transformers, attention, etc. It seems OpenAI had to train GPT-2 to show others (including Google itself!), that this is perhaps something worthy of more research. And even then many doubted it has much further potential to improve and be more useful.

Maybe it (these generative 3D scenes) will lead to something and maybe it won't but it is a very interesting thing to research.

[+] skocznymroczny|2 years ago|reply
I think for regular folks who just want to quickly generate something it may look like it. But for people who need precise control they will get it. Look at ControlNet extensions for StableDiffusion and imagine something like that but for 3D models. Perhaps you will provide a sketch of the 3D scene with rough objects placement and the AI will fill in the rest. Or draw a fantasy map with points of interest and AI will generate the landscape, terrain and foliage to fit your description.
[+] woctordho|2 years ago|reply
If natural language can't exactly describe what you want, just generate more variations and let Monte Carlo sampling cover the way you want
[+] Larok00|2 years ago|reply
I wish there was more progress in text-to-3D mesh for creating basic but very specific and functional shapes. With the last few years of progress, it really feels like it should be possible, but none of the big players are finding it worthwhile to look at. It would give the 3D printing community a massive boost.
[+] qwery|2 years ago|reply
Depends what "very specific and functional shapes" you have in mind, I guess. For my use of a 3D printer, I believe something like OpenSCAD is going to be a significantly more efficient textual description of the object than ~english -- for me and the computer.
[+] matroid|2 years ago|reply
Hi, can you explain this problem a bit more. I’m a new PhD student and love low-hanging fruit.
[+] lupusreal|2 years ago|reply
Do we really want a boom in the field of half baked plastic trinkets made on a whim without much consideration put into them? I think something made on a whim is a lot more likely to be discarded. If somebody wants something made out of plastic, it should at least be something they're sure they want. Having some human time invest some time in designing it seems like a good thing.
[+] yorwba|2 years ago|reply
When you want a very specific shape, text is probably not the right input modality. See also image generation, where to get very specific outputs, you're better off defining the large-scale structure spatially with a controlnet and only using text for the visual style and decorative details that do not need to be precisely controlled.

What shapes would you ask a text-to-3D model to create for you?

[+] amarant|2 years ago|reply
Oh my, now that is exciting! A friend of mine had implemented procedural rigging, and physics based generative animations used in some pretty big name games. Combine those 2 technologies with this, and you've seriously lowered the entry bar for video game production by a lot!
[+] mdre|2 years ago|reply
Was this at Ubisoft maybe? I remember their presentations, some really amazing tech a decade ago... Never seen anything as straightforward to use in off-the-shelf software since.