top | item 30686027

Launch HN: Vimmerse (YC W22) – Platform and SDKs to create and play 3D video

67 points| jillboyce | 4 years ago | reply

Hello HN! We are Jill and Basel, founders of Vimmerse (https://vimmerse.net) Vimmerse is a platform and SDKs for creating and playing 3D video. We make it easy for businesses, developers, and creators to provide 3D immersive experiences for their viewers using our content creation APIs and player SDKs.

We have been watching video in two dimensions for too long! Most 3D content people experience today is computer generated, such as VR games. Diverse use cases can benefit from real-world 3D video, such as music performances, training, family memories, and the metaverse. Why isn’t there more real-world 3D video? Because it has been difficult and expensive to create, stream, and playback real-world, camera-captured 3D video content.

I am an IEEE Fellow and inventor of over 200 patents. Basel has a PhD in electrical engineering with deep experience in VR/AR/3D video. While at Intel (as Chief Media Architect and Intel Fellow), I led an MPEG standards workgroup on 360/VR video. I found that 360/VR video’s limitation to 3 Degrees of Freedom (DoF) caused discomfort or even nausea in some viewers, because we experience the real world in 6 DoF (controlling both position + orientation), not in 3DoF (just orientation). I initiated an activity in MPEG to develop the MPEG Immersive Video (MIV) standard, which provides 6DoF. I became the lead editor of the MIV standard, and Basel was the lead editor of the test model.

While at Intel, we developed a MIV 3D video player for Intel GPUs and observed the greater engagement that 3D video provides to viewers. However there was no content available for the new MIV standard, and creation of 3D video content was a very difficult and expensive process. We realized that if 3D video were to become widely used, the creation and distribution processes needed to be simplified. We founded Vimmerse with a mission to greatly expand access to 3D video.

Businesses can build their own services using our APIs to upload captured content and prepare 3D video on our platform. Our platform is capture agnostic, meaning it can work with any video device suitable for 3D capture, such as iPhones or Microsoft Azure Kinect depth sensors. More than 60% of iPhone 12 and 13 models sold (Pro and Pro Max) have LiDAR depth sensors, which can be used for capturing 3D video content.

The Vimmerse platform prepares 3D content from the uploaded capture files. Our approach is built on top of industry standard video streaming protocols and codecs, so existing video streaming servers and hardware video decoders can be utilized. The content preparation platform creates two types of output bitstreams from the uploaded captures: bullet video and 3D video. Bullet video (named after the Matrix movie’s bullet effect) is a 2D video representation of the 3D video, following a predetermined navigation path selected by the content creator. 3D video gives viewers the ability to control navigation with 6 Degrees of Freedom (6DoF), where they can pan around or step into the scene. Bullet video may be streamed (HLS) or downloaded (MP4) for playback on any device. 3D video playback may be streamed (HLS) to the Vimmerse 3D video player.

Services may use the Vimmerse 3D video player app, or developers can use our player SDK inside their own apps. Viewers have the ability to control navigation using any viewer input method: device motion, mouse/keyboard, touch controls, head/gesture tracking. The player SDK renders views for the selected 6DoF position and orientation.

We haven’t published pricing yet, but our plan is to charge for our content preparation APIs based on usage (e.g. minutes of video processed and streamed) and player SDKs based on number of units.

The Vimmerse website https://vimmerse.net provides a no code way to test out our platform or view featured content. We invite the community to upload their own test content. Instructions for preparing content are available at https://blog.vimmerse.net/freeport-platform-usage-instructio.... Sign up for an account to upload content, or use the guest account (login: guest, password: Guest#123). The Vimmerse 3D player for Android is available in the Google Play Store at https://play.google.com/store/apps/details?id=net.vimmerse.p....

Please share your thoughts and experiences with 3D video, and your ideas for use cases that would benefit the most from 3D video. Are there any features we should add, or capture devices that you would like to have supported? Looking forward to getting your feedback.

32 comments

[+] pavlov|4 years ago|reply

Congrats on the launch!

I've been very interested in this field since the original Kinect came out and it was discovered you can easily get the raw RGB+D data over USB. I made a bunch of prototypes and had ambitious plans to make a depth-enabled short movie for my film school MA degree project, with a player on the original iPad and head tracking for a parallax effect. (A project much too ambitious; I never finished the player or the movie or the degree.)

IMO you should hire an in-house artist right away, assuming you have funding. Right now your Featured Content section looks like demo footage from academic papers. For someone uninitiated, this content doesn't really present the potential of immersive video.

You should have someone with enough artistic vision and technical competence whose primary job is to come up with new exciting demos of your technology. That person will need patience and a start-up mentality because the content production pipeline for something this cutting-edge is probably very foreign to most artists. But it would make a huge difference in how people can understand your project.

(If I were twelve years younger I'd send you a job application right away. Working on this would have been my dream job.)

[+] BSalahieh|4 years ago|reply

Thank you for the comment, actually we have recently captured some interesting content that will clearly demonstrate the potential of immersive video... Those are in the make so stay tuned to see what is coming next :) As long you have the passion, this means you are young (with experience on top) for the technology. Great feedback and if you still have the kinect DK around, you may even shoot and upload some content!

[+] jillboyce|4 years ago|reply

I agree that we do need some better demo content that can better illustrate the potential of 3D immersive video. We are working on that right now. I encourage anyone who has some interesting content or can make some to contact me.

[+] koinexpert|4 years ago|reply

I'm curious why you are referring to 'Kinect'. And because I also agree that the website needs a lot of work to make it more obvious of what the software can actually do, I was wondering if you could answer me this:

Can their software measure depth and height? As in, if I take a 3D picture of an object, can it tell me the dimensions. As in, can this thing give me a computer model of a 3d object on my screen?

[+] reggieband|4 years ago|reply

Maybe a bit in the weeds on the technical side ... but any reason you went with HLS vs. dash? I spent a good few years streaming video (2d) and my personal preference was HLS (mostly due to the intransigence of Apple and thereby its ubiquity) but a lot of people on the cutting edge of that field were hyping up dash. Was that a technical decision? I mean, I know of ways to support both in regular video (excepting the schism in DRM technologies) so I assume it must be technically possible to do it in 3d.

Also, just an FYI but Meta (nee Facebook) is doing the hard-press on LinkedIn for engineers with streaming experience specifically looking for AR/VR video streaming. How do you fit into that eco-system? Video is powerful because it is everywhere ... on Reddit, in my Facebook feed, on Instagram not to mention YouTube. It took a while for that to happen - but is there an on-ramp to similar services in your roadmap?

I've played a bit with 3d video, even bought a VR 180 camera. You mention one deficiency of VR 360 you sought to solve was that it lacked in degrees of freedom. My own experience was that VR 180 was significantly more interesting. I honestly don't want to swivel my head around much when consuming content and VR 180 at least keeps the majority of the action right in front of me. It feels more appropriate in many ways for many purposes, including a Zoom like VR conferencing setup. I would assume it uses less data as well which seems like a win.

Finally, AR feels like such an interesting area in this space. Do your formats include any kind of objects/meshes/shaders/etc? Now that I think about it, I'm even more interested in 3d spacial audio. Do either of these figure into your offering?

[+] jillboyce|4 years ago|reply

We went with HLS instead of DASH because of easier iPhone integration, but we can certainly offer DASH support if there is sufficient customer demand. Our technical approach works just as well for DASH as for HLS.

Because our approach is built on existing video streaming protocols/servers and video codecs, we think it is a straightforward step to add 3D video streaming to existing 2D video streaming services. As you say, 2D video is everywhere now. We envision a future where 3D video also becomes ubiquitous.

With our system, 180 or 360 cameras can be used. It is up the creator to decide what range of volume to capture, what type of cameras, how many cameras, etc., which determines the range of motion is supported for the viewer.

It is on our roadmap to allow augmentation of real-world 3D video with objects/meshes like in AR, except instead of augmenting your current local scene, you can augment a remote scene (or remote in time scene).

Spatial audio would also be a very useful feature. We are video experts, not audio experts, so would plan to work with a partner to offer support for spatial audio in the future.

Thanks for your comments. It's great to hear what features people are most interested in.

[+] BSalahieh|4 years ago|reply

I love this discussion... I second Jill in every thing she said. Our solution makes it straight forward to support immmersive video using legacy video servers (current infrastructure and bandwidth limits). Besides, we already have support for perspective and equirectangular projections and we plan to add more whenver there requests from creators / customers. Supporting MIV / Kinect / iPhones captures are just the begining and we definetly are going to add support for other volumetric representations in the future.

[+] jzer0cool|4 years ago|reply

Intrigued by the number of patents you have, 200! I like to understand the process of submitting a simple one. Could you share costs and share process so I can get habit of attempting to submit one? Has any patents come to fruition in terms of royalties, $$, etc., do you mind sharing?

[+] jillboyce|4 years ago|reply

All of my granted patents are owned by my previous employers, who also paid all expenses involved in submitting the patent applications, and collect any royalties. Some of my patents are essential for video codec standards, including H.264/AVC and HEVC.

It was a change of mindset to apply for my first patent for Vimmerse.

If you are in the US, the USPTO has a good overview here: https://www.uspto.gov/patents/basics/patent-process-overview

Application fees are lower for small entities or micro entities. https://www.uspto.gov/learning-and-resources/fees-and-paymen...

But the biggest cost is paying for a patent attorney to prepare the application. If you want to try to do it on your own, I found this book very helpful: https://www.amazon.com/Invention-Analysis-Claiming-Patent-La...

Good luck!

[+] BSalahieh|4 years ago|reply

Yes, Proud to have Jill as my co-founder :)

[+] opan|4 years ago|reply

No relation to vim then? I guess it's v for virtual and then "immerse".

[+] jillboyce|4 years ago|reply

No relation to vim. Yes, Vimmerse = V + immerse. V is for video or volumetric or virtual...

[+] almog|4 years ago|reply

"Ah, finally a VR platform where decent people (and otherwise) can immerse themselves in vim..." I thought to myself while deep inside me I knew it was destined to be a short road to disappointment and heartbreak.

[+] xcambar|4 years ago|reply

As a typical HN member, I too saw Vim first.

But that's a very biased population for sure!

[+] soylentgraham|4 years ago|reply

Any plans for web & webxr playback?

Do you have encumbering patents for depth+colour video playback? Why not give away the player to aid/standardise adoption of the format?

A lot of people are using this format (including myself [0]) and it seems nobody is really pushing a standard/shared player for web, unity, unreal, etc still; it feels to me we could all move along a bit faster bu charging for capture and freeing playback.

[0] https://panopo.ly

[+] hyferg|4 years ago|reply

This seems like a step in the right direction away from sphere-projected 'immersive' video. Very cool and can't wait for the 6dof VR player to be ready!

[+] jillboyce|4 years ago|reply

Yes, sphere-projected VR/360 video isn't immersive enough given its limitation to 3 DoF. 6DoF video feels so much more immersive.

[+] BSalahieh|4 years ago|reply

The motion parallx supported here makes it more natural to consume such content and when you drive the motion yourself (using our Vimmerse player) this takes you to a whole new level. For some content, you really feel that you are present there!

[+] DonHopkins|4 years ago|reply

Will there be an Emacsmerse compatibility mode? ;)

But more seriously (but still on the spirit of Emacs), how will your 3D video player be scriptable and extensible at runtime?

I have some positive experience with extensible 3D panoramic stereo videos players, which I'd love to share with you:

I developed a system for JauntVR that enables cross-platform scripting of Unity3D applications in JavaScript, for Jaunt's 3D panoramic video player. It also greatly eases and accelerates development and debugging, which is otherwise extremely slow and tedious with Unity.

Unfortunately JauntVR ran out of money for that project and pivoted to other stuff before publishing it, but Arthur van Hoff generously let me open source the general purpose Unity/JavaScript scripting system I'd developed for it, which I call "UnityJS".

It's useful for all kinds of other things beyond scripting 3D videos, and it works quite nicely not just on mobile devices but also with WebGL, but scripting interactive 3D video was the problem I originally created it to solve.

I've used for several applications and integrated it with several libraries, including financial data visualization on WebGL, TextMeshPro in Unity, ARCore on Android, and ARKit on iOS.

UnityJS also makes developing and debugging scriptable Unity apps much easier and quicker (by orders of magnitude) because you can use the standard JavaScript debuggers (even on mobile devices) and just restart the app to reload new code in seconds, instead of waiting the 10-60 minutes it takes to totally rebuild the app in Unity.

By using the same standard built-in JavaScript engine of the web browser on each platform, you can implement cross-platform interfaces in JavaScript instead of re-tooling the interface for each platform in different languages, and incorporate standard up-to-date off-the-shelf JavaScript libraries.

I believe the 3D video player should be much more like an extensible scriptable browser (and like Emacs and NeWS!), instead of just a dumb fixed-function play/pause/rewind/fast-forward video player (like piping a text file through "more" on a VT100).

So you can download dynamic content and scripts that implement custom interfaces for 3D videos, like interactive titles and credits, custom menus, theater lobbies, even networked multi player environments for watching 3D videos together.

It's great fun to overlay live 3D graphics, text, physics simulation, interactive kinetic typography, particle systems, etc, into 3D videos.

It's extremely useful for selecting and navigating content, displaying interactive titles and credits, or anything else you might imagine, like games and applications embedded in and orchestrating 3D video.

And of course you can publish new content, hot updates, and extend the experience by downloading new code and data over the internet, instead of pushing out a new version of the app through the app store every time you want to improve the experience or publish a new video or series.

Some obvious and useful application for 3D videos is designing compelling opening and closing credits, captions, subtitles, pop-up notifications, pie menus, player controls, navigation, bookmarking, tagging, and any other kinds of user interfaces, that can adapt to the content and environment in whatever direction you're currently looking, so you don't miss them and they're not unusable, just because you were looking in the wrong direction at the wrong time.

3D videos suffer practically and creatively from hard-coded static title sequences and credits, because you might not be looking in the right direction at the right time to see them. It's like pointing a movie camera at a stage to film a play, instead of inventing a whole new expressive cinematic language appropriate for the new medium. And I'm not just talking 3D scrolling Star Wars credits!

Of course you could just build a dumb non-extensible 3D video player with just one hard-coded style of menus and scrolling credits, or burn the title sequence and credits into the 3D video itself, but that would be static, boring, and inflexible, because the title sequence itself is an important thematic art form, which should ideally be designed for each different video, collection or channel.

The point is to enable the infinite creative possibilities of interactive titles and credits that respond to the viewer's attention and interest: scrolling back and forth, revealing more information about the names you focus on in the credits, and all kinds of other cool innovations that run-time scripting, dynamic downloadable content, and interactive 3D graphics enable.

Saul Bass realized that 2D movie title sequences could be so much more than simply scrolling a list of names up the screen, and he designed the groundbreaking opening credits to movies by Hitchcock, Scorsese, Kubrick, and Preminger. Imagine how intense the interactive panoramic opening titles of a 3D film like Psycho or North by Northwest could be, to set the mood for the rest of the movie!

https://en.wikipedia.org/wiki/Saul_Bass

>During his 40-year career, Bass worked for some of Hollywood's most prominent filmmakers, including Alfred Hitchcock, Otto Preminger, Billy Wilder, Stanley Kubrick and Martin Scorsese. Among his best known title sequences are the animated paper cut-out of a heroin addict's arm for Preminger's The Man with the Golden Arm, the credits racing up and down what eventually becomes a high-angle shot of a skyscraper in Hitchcock's North by Northwest, and the disjointed text that races together and apart in Psycho.

Psycho's opening credits

https://www.youtube.com/watch?v=hwq1XHtJEHw

Saul Bass: North by Northwest (1959) title sequence

https://www.youtube.com/watch?v=1ON67uYwGaw

The Art of Movie Title Design | Saul Bass and Beyond

https://www.youtube.com/watch?v=Q_Mo0MqICXI

Saul Bass- Style is Substance

https://www.youtube.com/watch?v=frWLpyI3lXY

>"I have felt for some time that the audience involvement with the film should really begin with the first frame. You have to remember that until then, titles tended to be just some dull credits, mostly ignored, or used for popcorn time. So there seems to be a real opportunity to use titles in a new way. To actually create a climate for the story that was about to unfold." -Saul Bass

UnityJS is useful for other stuff too: Here's a great example of an interactive WebGL based business model and structured financial data visualization system that I developed with UnityJS for Jen van der Meer of Reason Street.

The lion's share of the code is written in JavaScript, plus a bunch of generic and bespoke modular Unity components and prefabs that JavaScript plugs together and orchestrates. On top of Unity's 3D graphics, it uses the canvas 2D API and D3 to draw the user interface, info overlay panels, and translucent pie menus with live tracking feedback, all written in JavaScript so it's easy to iteratively develop and debug!

It's totally data driven by JSON data and CSV spreadsheets. So Jen can just enter fresh data into google sheets, turn the crank, and play around with the interactive visualizations within seconds. And I can just as quickly and easily modify the code or data, then quickly run and debug the new version!

Amazon FY2018 Visualization:

https://www.youtube.com/watch?v=J8uOjelouUM

>"I created this tool as a way to kind of tell the story of how companies shift from e-commerce, and build these combinatorial, complex, technical system type business models. It's a little bit of an obsession of mine. If you're curious about this project you can find me in the links below. I'll be collaborating with my long time collaborators here to see how complex companies and ecosystems [work], and help us all better understand how to think about these technology business models, and how they so substantially shape our world." -Jen van der Meer

Apple Services Pivot:

https://www.youtube.com/watch?v=TR_4w7OCW4Y

Facebook FY 2018 Financials Visualization:

https://www.youtube.com/watch?v=cMQgjmj_mnQ

Here are some link to some doc, notes, and posts about UnityJS I've written. And if you're interested in using it, I have a more recent, modular, up-to-date version that I'm refactoring to use the Unity package system instead of git sub modules and symbolic links, and it also has some nice improvements for WebGL, and much better JSON.net<=>Unity integration:

https://github.com/SimHacker/UnityJS/blob/master/doc/Anatomy...

https://github.com/SimHacker/UnityJS/blob/master/notes/talk....

https://github.com/SimHacker/UnityJS/blob/master/notes/unity...

https://news.ycombinator.com/item?id=21932984

https://news.ycombinator.com/item?id=22689008

https://news.ycombinator.com/item?id=22691004

[+] pedalpete|4 years ago|reply

I really wanted to do some great 3D interactive text in https://ayvri.com, but we learned in the process that you need to give the viewer a target or they loose context of what the text/menu/etc is attached to, or where it exists in the scene. The best way we found to do this was to use a 2D layer. I wanted to add some "flag" type animations, and do some other cool stuff in there, but we never got around to it.

I'm curious why you're recommending UnityJS instead of just a straight WebGL implementation.

Unity is an amazing tool, but my understanding is that the assets need to be packaged with the player, and that isn't done in real-time. So adding new functionality would mean either some of your videos don't have the full capabilities, or you're recompiling them.

We've been mostly out of this space for the last two years, so it's also possible that I'm entirely wrong about that.

[+] jillboyce|4 years ago|reply

Thanks for the info about the open sourced UnityJS. I'll take a look.

We hadn't thought about making the 3D video player be scriptable and extensible at runtime, and will give it some thought.

Being able to overlay 3D graphics ( including titles) onto the 3D video is on our roadmap. Glad to hear confirmation that it will be a useful feature to add.

[+] ZhangSWEFAANG|4 years ago|reply

[deleted]

[+] jillboyce|4 years ago|reply

https://www.linkedin.com/in/jillboyce/

Lots of documentation available on the public Joint Video Experts Team (joint group between MPEG and ITU-T) website https://jvet-experts.org/ about the activities I led in 360 video.

Here is an example: https://jvet-experts.org/doc_end_user/current_document.php?i...

[+] ddlutz|4 years ago|reply

Please don't bring this toxic culture here.

[+] unknown|4 years ago|reply

[deleted]