WorldGen – Text to Immersive 3D Worlds

[+] schmichael|3 months ago|reply

It’s a fun demo but they never go into buildings, the buildings all have similar size, the towns have similar layouts, there’s numerous visual inconsistencies, and the towns don’t really make sense. It generates stylistically similar boxes, puts them on a grid, and lets you wander the spaces between?

I know progress happens in incremental steps, but this seems like quite the baby step from other world gen demos unless I’m missing something.

[+] thwarted|3 months ago|reply

> they never go into buildings, the buildings all have similar size, the towns have similar layouts, there’s numerous visual inconsistencies, and the towns don’t really make sense

These AI generated towns sure do seem to have strict building and civic codes. Everything on a grid, height limits, equal spacing between all buildings. The local historical society really has a tight grip on neighborhood character.

From the article:

> It would also be sound, with different areas connected in such a way to allow characters to roam freely without getting stuck.

Very unrealistic.

One of the interesting things about mostly-open world game environments, like GTA or Cyberpunk, is the "designed" messiness and the limits that result in dead ends. You poke at someplace and end up at a locked door (a texture that looks like a door but you can't interact with) that says there's absolutely nothing interesting beyond where you're at. No chance to get stuck in a dead end is boring; when every path leads to something interesting, there's no "exploration".

[+] jaccola|3 months ago|reply

This is potentially a lot more useful in creation pipelines than other demos (e.g. World Labs) if it uses explicit assets rather than a more implicit representation (gaussians are pretty explicit but not in the way we are used to working with in games etc...).

I do think Meta has the tech to easily match other radiance field based generation methods, they publish many foundational papers in this space and have Hyperscape.

So I'd view this as an interesting orthogonal direction to explore!

[+] ProofHouse|3 months ago|reply

is there a working 'demo' I don't see one?

[+] serf|3 months ago|reply

>It’s a fun demo but they never go into buildings, the buildings all have similar size, the towns have similar layouts, there’s numerous visual inconsistencies, and the towns don’t really make sense.

that's 95% of existing video games. How many doors actually work in a game like Cyberpunk?

on a different note , when do us mere mortals get to play with a worldgen engine? Google/meta/tencent have shown them off for awhile but without any real feasible way for a nobody to partake; are they that far away from actually being good?

[+] Difwif|3 months ago|reply

This just seems like an engineered pipeline of existing GenAI to get a 3d procedurally generated world that doesn't even look SOTA. I'm really sorry to dunk on this for those that worked on it, but this doesn't look like progress to me. The current approach looks like a dead end.

An end-to-end _trained_ model that spits out a textured mesh of the same result would have been an innovation. The fact that they didn't do that suggests they're missing something fundamental for world model training.

The best thing I can say is that maybe they can use this to bootstrap a dataset for a future model.

[+] kubb|3 months ago|reply

The people who worked on it did what they could to satisfy the demands of their higher-up’s, who frequently are out of touch with the technical landscape.

Being kind to them and understanding the environment they work in won’t improve their lives, but it will expand our understanding of the capability of particular large companies to innovate.

[+] theptip|3 months ago|reply

What’s SOTA in this area right now?

[+] ranyume|3 months ago|reply

I'd call this 3DAssetGen. It's not a world model and doesn't generate a world at all. Standard sweat-and-blood powered world building puts this to shame, even low-effort world building with canned assets (see rpg maker games).

[+] wkat4242|3 months ago|reply

It's not really a world no. It generates only a small square by the looks of it. And a world built out of squares will be annoying.

Still, it's a first effort. I do think AI can really help with world creation, which I think is one of the biggest barriers to the metaverse. When you see how much time and money it costs to create a small island world called GTA..

[+] unknown|3 months ago|reply

[deleted]

[+] ipsum2|3 months ago|reply

Nowhere in the page does it state that's it's a world model.

[+] mwkaufma|3 months ago|reply

I would simply spend $5 at an asset store for some blobby generic buildings, than orchestrating a 12-figure corporate debt bubble to build warehouses of rapidly depreciating rust that boils a lake in order to generate them, but I guess that's why I'm not a Business Genius.

[+] crashprone|3 months ago|reply

Or spend that 5$ supporting folks like Quaternius who offers really cool low poly game assets. I wonder if 3D artists have the will to give away assets for free these days.

[+] meander_water|3 months ago|reply

It's funny, I clicked the link to the demo, but it 404s, then I tried googling Worldgen, and it turns out someone else has built the same thing in May and called it Worldgen as well. Looks like it does better at realistic 3D scenes compared to this.

[0] https://worldgen.github.io/index.html

[+] jsheard|3 months ago|reply

That's pretty far from the same thing, their technique is a 2D image in a trenchcoat. It instantly falls apart if you move more than a foot or so from the original camera position.

[+] boriskourt|3 months ago|reply

The paper is quite good [0] there are some interesting details on tackling individual meshes.

(couldn't cleanup the link at all sorry)

[0]: https://scontent-lhr6-2.xx.fbcdn.net/v/t39.2365-6/586830145_...

[+] Fearlesspancake|3 months ago|reply

They use the word "interactive" several times, and I kept expecting that to mean truly interactive i.e. the ability to open doors or pick up objects to use them, but it seems that they mean "interactive" to mean "able to view and explore from a first person perspective". By that definition any 3D model is interactive.

[+] mmaunder|3 months ago|reply

Panorama gen via 2d diffusion inpainting, to point cloud lifting to 3d, to 2d inpainting conditioned on rendered point clouds, to optimization of a 3d gaussian splatting scene. It's image gen stitched into 3D. Not a conceptual world model. I hate the ambiguity of the term.

[+] Fraterkes|3 months ago|reply

Having the technical knowhow to have an ai generate 3d models, but then generatively compositing those assets together into environments in a way that would have seemed overly simplistic to gamedevs 3 decades ago…

[+] noduerme|3 months ago|reply

How many f*cking world gen models and datacenters will it take to realize that we all just want a better version of SimCity? And what an ironic thing to divert gigawatts of power and vast amounts of water to building. SimCity tiles and walkthrough (Potemkin) villages.

I still won't even get myself a PlayStation, explicitly because I know if I did I would lose half year of my life to Red Dead. Who actually benefits from this technology, or is it just a cool demo?

[+] willyxdjazz|3 months ago|reply

It's funny, I don't know if I see a use for it, and this feeling surprises me. Just as procedural maps bore me, I feel this will be similar in any use case I can think of. What I like is the perceived care behind every action. After the initial "wow" of the care put into that research, I don't think it will end up being a "wow" that scales—I don't know if I'm making myself clear.

[+] galleywest200|3 months ago|reply

I loathe how meta.com makes my back button gray out in my browser. Stop trying to force me to stay, it is obnoxious.

[+] crazygringo|3 months ago|reply

The back button works fine for me (on Chrome). I can go back from the post to HN, and I can navigate to other pages on Meta and then go back.

What browser are you using? How is it even possible for a site to remove previous browser history in a tab?

[+] anotheryou|3 months ago|reply

It's more like a 3D asset generator sprinkling them across a generic landscape. The "World" part falls a bit short, the rounded 3D not even that good.

[+] elAhmo|3 months ago|reply

The vibe from the first video reminds me of Warcraft 3 and DotA.

DotA was effectively a simple map that changed online gaming, e-sports, and I am sure there are millions/billions of hours spent by players in a very simple looking landscape.

Compared to what we have today, on-demand, unique, and significantly better looking. It is amazing to see how relatively small these, objectively amazing, achievements seem, compared to a simple map we had 20 years ago.

[+] hu3|3 months ago|reply

This is like GTP 2 of World Gen.

10 years from now we might have games that generate entire worlds based on the unique story line that's customized for each playthrough. Maybe even endless stories.

Baldur's Gate 5 is going to be memorable!

The Elder's Scrolls could use this + Radiant AI for some neat quests when it improves.

Game studios are probably going to explore this in dungeon generators first where if things go wrong with the generation, not much is lost. Just exit and generate another.

[+] webdevver|3 months ago|reply

if they train it on public world data itll be freaky if you give it a prompt of your address and it re-creates your house

[+] visioninmyblood|3 months ago|reply

Not sure what is going on but seems like meta is lacking behind other startups and other frontier models in this space. They invested most in Meta reality labs in the last decade more than any other company and they come up with such poor rendering while the competitors are making pretty cool real world demos. Meta should stop thinking of these as research projects and actually spend time building real products with proper 3d rendering.

[+] Oarch|3 months ago|reply

Google released Genie 3 back in August, which seemed more compelling than this. I was surprised by how little fanfare it received.

[+] nitwit005|3 months ago|reply

I can see this working as a randomly generated map for some quick game, like the Worms games did in 2D.

But, having things feel strongly on a grid kind of ruins the feel. It's rare for every building to be isolated like that. I am guessing they had trouble producing neighboring buildings that looked like they could logically share a common wall or alleyway.

[+] mrdependable|3 months ago|reply

These look a lot like World of Warcraft. I wonder how much of their training data they got from it.

[+] philipwhiuk|3 months ago|reply

It's definitely a step forward from that 'Minecraft world' gen tech demo that had no persistence of vision.

I can see it being useful for isolated Unity developers with a concept and limited art ability. Currently they would be likely limited to pixel games.

[+] coffeebeqn|3 months ago|reply

They are not limited like that. You can get a game very far with those asset store assets

81 comments