Atlas: End-to-End 3D Scene Reconstruction from Posed Images

[+] cs702|5 years ago|reply

From the folks at Magic Leap. It looks remarkably good to me.

The video at https://www.youtube.com/watch?v=9NOPcOGV6nU&feature=youtu.be is worth watching, especially the parts showing how the model gradually constructs and improves a labeled 3D mesh of a live room as it is fed more visual data by walking around the room.

--

On a related note, Magic Leap has been trying to find a buyer for the business for several months now:

https://www.roadtovr.com/report-magic-leap-buyer-sale/

https://www.bloomberg.com/news/articles/2020-03-11/augmented...

[+] caiobegotti|5 years ago|reply

I have no experience in this field at all and they note on the video that the sequence shown was not realtime but I wonder how far we're from having something like this running in realtime or how "realtime" it could be given fancy hardware to be used in the wild?

[+] kanobo|5 years ago|reply

On a tangential thought, it's interesting to me that a company (magicleap) that has raised several billion dollars generates so little value compared to other companies its size that this is the most notable output from them in a year and I thought it was a phd project until I looked at the project owner. Anyways, it's a very interesting project and thanks for sharing.

[+] rayuela|5 years ago|reply

Yeah I have to agree. If this were a PhD thesis it would certainly deserve some praise, but given that this is the most exciting thing to come out of magic leap in years just barely puts them on par with SOTA...well I would be pretty pissed if I was an investor in them.

[+] Theodores|5 years ago|reply

It is easy to have a billion dollar company if you have borrowed two billion. And haven't spent all the loot on salaries.

The company has a cool name and the product area is divisive. Some say it is vapourware and nobody wants Oculus Rift style VR. Others are gung-ho. It's like Bitcoin all over again.

Although this tech is being done with AI, it was being done with non-AI approach two decades ago for movies/TV. But it wasn't as if people ported this tech to their smart phones from the SGI desktop monsters of yesteryear.

[+] pen2l|5 years ago|reply

Here's a challenge question to folks reading this and learned with the tools of the trade (my apologies in advance for somewhat hijacking the thread): consider this video of an endoscopy: https://www.youtube.com/watch?v=DUVDKoKSEkU -- say, from 3:00 to 5:00. And I have a bunch of movies (i.e., a series of images!) and I want to do a 3d reconstruction of this.

It seems super, super difficult... there are free-flowing liquids, and since this is an esophagus/upper lining of the stomach which is changing in form quite drastically so often. How would you guys approach this problem?

[+] ghj|5 years ago|reply

Even more hijacking, I remember thinking medical applications were going to be the killer apps for VR. I was blown away by these demos almost half a decade ago https://youtu.be/MWGBRsV9omw?t=251

Did they ever make it into real life practice?

[+] yorwba|5 years ago|reply

You're not the first to come up with that challenge ;) https://endovis.grand-challenge.org/

[+] tibbon|5 years ago|reply

I wonder how long it's going to be before we're able to run a significant portion of Youtube video (tourist videos, etc) through something like this, and generate a huge 3d mesh of the world. Combined with Street View data, you'd really have a ton of spaces covered.

[+] pfranz|5 years ago|reply

I believe random videos are too low of a quality. Like this, most of the stuff I've seen uses constrained videos.

I have seen random still images used for this kind of thing: https://nerf-w.github.io/

I haven't heard of any equivalent of EXIF for video. That goes a long way when trying to make sense of random video both for camera settings as well as GPS location if you're trying to correlate multiple videos.

[+] tetris11|5 years ago|reply

Google will do this, and then sell the data to security institutions. We will be told about it later, or consent to it during a Terms&Conditions update.

[+] welfare|5 years ago|reply

Cool idea, but how would you keep it maintained? It's tricky enough to keep maps up to date. A 3D Mesh would be even more complex to maintain.

[+] bl0b|5 years ago|reply

Looks awesome. Given it takes position data along with images, how accurate must the position data be? Could it handle something like sensor drift in the position data over time?

[+] toomuchtodo|5 years ago|reply

For anyone with domain knowledge, how applicable is Google's NeRF work here in comparison? Is there any overlap?

https://nerf-w.github.io/

https://news.ycombinator.com/item?id=24071787

EDIT: @bitl: Tremendous, thanks for the reply. Would be amazing to be able to build these scenes just by walking around scanning a room with your mobile phone while it records video for processing the frames into scenes (especially considering mobile platforms with a depth sensor for enrichment of the collected data).

[+] nestorD|5 years ago|reply

By default NeRF does not produce a mesh (but one could use marching cubes as does Atlas) and it requires training a neural network for each scene whereas Atlas (as far as I understand it) uses pretrained network to process new scenes.

NeRF would probably produce a much better final result but the Atlas approach (no need to train something from scratch) is the only one that can hope to be run in real time which is vital for some application.

[+] bitL|5 years ago|reply

NeRF has a potential to make all those classical methods obsolete, though it requires many input images and I am not sure how it handles rolling shutter and other distortions.

[+] nickponline|5 years ago|reply

Is there anything that would prevent this approach working on 360 video?

[+] TaylorAlexander|5 years ago|reply

In theory this should work. I’ve been doing photogrammetry with spherical video and existing software packages often want to “dewarp” the image on to a plane, which works fine for narrow field of view but fails on spherical video. It would be interesting to see if atlas supports spherical input. Also 360 cameras have pretty low visual acuity. My 5.6k GoPro Fusion has to divide those pixels across the whole field of view, so images are less detailed. Still I think 360 video can be useful in photogrammetry with the right algorithms.

[+] jayd16|5 years ago|reply

Worst case, you can sample the 360 frames to get images with a smaller field of view. However, the app takes in camera intrinsics and positional data so it seems like it would work out of the box.

[+] exit|5 years ago|reply

i imagine a lot of unfortunate artefacts come out of stitching together the camera views that form a 360 or "spherical" image.

[+] jjbolaygaj|5 years ago|reply

Ladies and gentlemen you are looking at the pinnacle of mankind's technological achievements. The proof?

We can now make tiny virtual cars do stunts off object in the real world: https://www.youtube.com/watch?v=9NOPcOGV6nU&feature=youtu.be

25 comments