top | item 33110459

(no title)

nobbis | 3 years ago

Key step in generating 3D – ask Stable Diffusion to score views from different angles:

  for d in ['front', 'side', 'back', 'side', 'overhead', 'bottom']:
    text = f"{ref_text}, {d} view"

https://github.com/ashawkey/stable-dreamfusion/blob/0cb8c0e0...

discuss

shadowgovt|3 years ago

I'm modestly surprised that those few angles give us enough data to build out a full 3D render, but I guess I shouldn't be too surprised, as that's tech that has had high demand and been understood for years (those kind of front-cut / side-cut images are what 3D artists use to do their initial prototypes of objects if they're working from real-life models).

nobbis|3 years ago

DreamFusion doesn't directly build a 3D model from those generated images. It starts with a completely random 3D voxel model, renders it from 6 different angles, then asks Stable Diffusion how plausible an image of "X, side view" it is.

It then sprinkles some noise on the rendering, makes Stable Diffusion improve it a little, then adjusts the voxels to produce that image (using differentiable rendering.)

Rinse and repeat for hours.

unknown|3 years ago

[deleted]

mhuffman|3 years ago

I don't think that NeRFs require too many image to make impressive results.

dwallin|3 years ago

Given the way the language model works these words could have multiple meanings. I wonder if training a form of textual inversion to more directly represent these concepts might improve the results. You could even try teaching it to represent more fine grained degree adjustments.