top | item 37695530

Show HN: Generative Fill with AI and 3D

360 points| olokobayusuf | 2 years ago |github.com | reply

Hey all,

You've probably seen projects that add objects to an image from a style or text prompt, like InteriorAI (levelsio) and Adobe Firefly. The prevalent issue with these diffusion-based inpainting approaches is that they don't yet have great conditioning on lighting, perspective, and structure. You'll often get incorrect or generic shadows; warped-looking objects; and distorted backgrounds.

What is Fill 3D? Fill 3D is an exploration on doing generative fill in 3D to render ultra-realistic results that harmonize with the background image, using industry-standard path tracing, akin to compositing in Hollywood movies.

How does it work? 1. Deproject: First, deproject an image to a 3D shell using both geometric and photometric cues from the input image. 2. Place: Draw rectangles and describe what you want in them, akin to Photoshop's Generative Fill feature. 3. Render: Use good ol' path tracing to render ultra-realistic results.

Why Fill 3D? + The results are insanely realistic (see video in the github repo, or on the website). + Fast enough: Currently, generations take 40-80 seconds. Diffusion takes ~10seconds, so we're slower, but for the level of realism, it's pretty good. + Potential applications: I'm thinking of virtual staging in real estate media, what do you think?

Check it out at https://fill3d.ai + There's API access! :D + Right now, you need an image of an empty room. Will loosen this restriction over time.

Fill 3D is built on Function (https://fxn.ai). With Function, I can run the Python functions that do the steps above on powerful GPUs with only code (no Dockerfile, YAML, k8s, etc), and invoke them from just about anywhere. I'm the founder of fxn.

Tell me what you think!!

PS: This is my first Show HN, so please be nice :)

102 comments

order
[+] LeonM|2 years ago|reply
I am impressed by the tech, but appalled by the possibilities.

Where I live, it is already common practice for real estate 'agents' to photoshop the properties listed for sale to make them look fully renovated and furnished. When in reality the house is empty and in very bad shape.

This tech will make it even harder to judge a property without actually viewing it in real life.

I think we can no longer stop tech like this from being used in ads (because that's effectively what property listings are nowadays). The only solution I think is policies/laws that prevent real-estate marketplaces from showing fake pictures.

That all said, I think the author can make big money from realtors by selling this tech as a subscription model.

[+] linsomniac|2 years ago|reply
I think we already have laws around misrepresenting things for sale... As far as furnishings: that's definitely spelled out in the contracts for what is included.

I'm sure it varies area to area, but the biggest thing I see in our area is things like adding sunsets in the windows or behind the property photos, but we wouldn't necessarily know if a Realtor had photoshopped out mold or water damage or the like.

[+] smrtinsert|2 years ago|reply
They've had staging photoshop forever.
[+] matsemann|2 years ago|reply
The example looks very good. Do you have more images to share? I think more examples would be nice to show off more of what it can handle. Different room types, interiors etc.

Also in that regards: I'm curious about what it can't handle. Any situations where it borks?

[+] reichardt|2 years ago|reply
Amazing! The inserted objects are renders of textured 3D models and not generated by a diffusion model + ControlNet? Is there a fixed set of textured 3D models available or are they generated on the fly based on the prompt?
[+] olokobayusuf|2 years ago|reply
Thats correct! Right now, we're using the Blenderkit dialog, but we can expand beyond it. When you type a prompt and search though, that's actually doing a multi-modal search (so you can ask for a 'red painting' and it'll actually find a red painting), so it's insanely more accurate than a regular search. AI everywhere!
[+] mentos|2 years ago|reply
My use case for this would be for decorating my apartment.

I’ve got a big empty studio with a bed and couch I’ve already purchased but trying to figure out what to fill in for all the other gaps. Coffee table, media console, tv or UST projector, bar or bookshelf or desk.

Would be nice if there was a way to populate it with items/products that can be purchased and aren’t purely conceptual.

[+] mft_|2 years ago|reply
I’ve not tried it yet, but came across this site the other day which meets your use case: https://aihomedesign.com

(No affiliation!)

[+] RockRobotRock|2 years ago|reply
Have real estate companies considered leaving a house unfurnished and letting potential buyers put on AR goggles to see what it would look like with their furniture?
[+] sci_prog|2 years ago|reply
Or could just use a phone/tablet as a "viewport". I know it wouldn't be as immersive but the barrier to have it adopted would be a lot lower.
[+] thih9|2 years ago|reply
This level of realism seems impossible in AR as of today, if path tracing a single frame takes a minute or more.
[+] qingcharles|2 years ago|reply
Will it work with decks and porches?

I have images of decks and porches that need staging for the construction company's web site.

[+] billconan|2 years ago|reply
I tried the demo, it seems to be buggy and it seems to only allow you to choose existing items from a predefined db.
[+] olokobayusuf|2 years ago|reply
What bugs did you encounter? And yes, because we're using actual 3D models, there's a fixed set of models (right now, just under 300). Because the priority is ultra-realism, the current state-of-the-art for 3D model diffusion won't cut it (see OpenAI Point-e https://github.com/openai/point-e).
[+] bsenftner|2 years ago|reply
Between Fill3D's architecture that 'path traces to render ultra-realistic results' and fxn.ai transparent deployment capability... I gotta say this is super impressive work. I can use both in a current project, and will be investigating.
[+] linsomniac|2 years ago|reply
What are you thinking is your business model? I'm a sysadmin at a small MLS, trying to figure out where we'd integrate it. At $2/stage it's something we'd probably have to have you bill the Realtor directly for (I don't think we do any pass-through billing), but could maybe include a couple stages per month per Realtor. I could see a fun use-case where consumers would be able to do their own staging, but there are probably few if any Realtors that will be willing to pay $2/stage for consumers to do that.
[+] olokobayusuf|2 years ago|reply
Would love to have a proper convo on this. With bulk pricing, I can reduce the price by quite a lot. Eventually, the goal is to have users be able to stage themselves in your property website or MLS. Please shoot me a note at [email protected] !
[+] pedalpete|2 years ago|reply
Now create a bunch of perspectives, and nerf or guassian splat that, and you've got a fully immersive 3D scene that is better than any rendering.
[+] jayd16|2 years ago|reply
Why is it better than any rendering?
[+] sourabh03agr|2 years ago|reply
The demo looks amazing! Congrats for your first show HN. Quick question on the technical side, do you generate the (added) objects in 3D directly or generate them in 2D and deproject it to 3D? If former, which foundation model are you using?
[+] aantix|2 years ago|reply
Is there any way to remove objects from an initial image, so that then it can be utilized for staging?
[+] kderbyma|2 years ago|reply
Live the project, great work! can you think about adding some ethical clauses to your license. Something to allow people to use it for good wholesome purposes, but to avoid letting it be used for scammers faking AirBnB listings for example
[+] doix|2 years ago|reply
If someone is willing to scam people on Airbnb, I'm pretty sure they're willing to break a software license.
[+] olokobayusuf|2 years ago|reply
This is a very good point. Thanks for bringing this up!
[+] init2null|2 years ago|reply
Wouldn't that qualify as a crime already? That sounds like fraud to me.
[+] ralfhn|2 years ago|reply
> virtual staging in real estate media If you can make this work with exteriors, Landscaping design is huge. Maybe start with something simple like desert landscaping (which is really just rocks, turf, Pavers, maybe small palm trees)
[+] philipov|2 years ago|reply
It looks like a cloud-only app. If it doesn't run entirely locally, it's useless to me. Shipping my data to an external data processor is a security risk I'm not allowed to take.
[+] olokobayusuf|2 years ago|reply
That's fine. Path tracing in a browser is pretty impractical today anyway. Check back in a few years, when WebGPU is much more mature.
[+] blovescoffee|2 years ago|reply
Could you speak more to the "deprojection" step? What is that?
[+] olokobayusuf|2 years ago|reply
Fill 3D takes a different step from diffusion, in that it tries to build an actual 3D scene (kinda like a clone) of what's in the image you upload. In some sense, that's actually the most fundamental representation of what's in your image (or said another way, your image is just a representation of that original scene).

So it works by trying to estimate a 3D 'room' that matches your image. Everything from the geometry, to the light fixtures, to the windows. It's heavily inspired by how humans (weird to contrast 'human' vs. AI work) do image/video compositing.

TL;DR: Image in, 3D scene out.

[+] artursapek|2 years ago|reply
Wow, nice. I hope you charge realtors a fat price for this
[+] moritonal|2 years ago|reply
You realise this is the role of entire teams at certain companies right? If you automate enough parts you'd do able to automate the work of 30 people per company doing this. Not the first to work this out either.

https://investor.wayfair.com/news/news-details/2023/Wayfair-...

[+] olokobayusuf|2 years ago|reply
Decorify from Wayfair is also using diffusion, same as the other folks who have built similar things in the market (InteriorAI is probably leading product here). We'll see where this goes :D