Runway – Create impossible video

[+] anigbrowl|4 years ago|reply

Former film pro here. I don't see anything 'impossible' here, most of this is stuff I could do on my desk 10 years ago with a modest setup - but only with a lot of patience and manual labor.

If these demos fairly represent the user experience then it's slick as hell and further blurs the line between editor, compositor, DI specialist etc. Much will depend on whether the ML marketplace can be competitive with many mature commercial offerings from software studios that have no intention of letting their lunch be eaten. Video production involves a lot of bleeding edge technology but the client base is also suspicious of new providers or do-it-all solutions at first, and very loyal to products and tech support offerings that have got them through difficult projects in the past, so there will be a big hill to climb between people acknowledging it's cool as hell and their willingness to sell it to a producer who is making a 5, 6, or 7 figure bet on what some editor/VFX geek is telling them.

The web browser/cloud storage is an issue. It could be a plus in many circumstances, but it's also a big barrier to any production that's working in a remote location without reliable internet, especially given the massive data volumes involved. That limits a lot of the use cases to post production, and many producers and directors are going to be wary of starting on one platform and finishing on something else; nobody likes workflow changes unless they can be shown something seamless, like having your Avid/Final Cut/Premiere/* project leave your machine and show up frame-perfect in Runway, along with a definite answer about export time like being able to take in 12 hours of video and have it online within 24h. There will also have to be a lot of questions about security, downtime, and being able to get your project back out of Runway if money runs out, the editor gets fired, creative differences rear their head etc.

Looks like an instant win for short-form projects like personal shorts, music videos, commercials, corporate, demo reels, spec pieces. Potentially very good for reality TV indie and low-mid budget films once the above questions are answered. Toughest nut to crack is large budget or episodic TV where there's very stiff competition and contractual or professional commitments already in place, but doable within 5 years.

[+] tfsh|4 years ago|reply

I've just given this a go, with very little work I was able to produce a near seamless mask which was able to adapt to dynamic and changing pose - interesting stuff.

The main drawback I observed was computation speed, most edits required >20 seconds of loading/buffering (with a 1Gbps internet connection). I presume the computation occurs on their own servers, so with beefier hardware the performance could be increased, however my intuition is that the app would perform faster if running natively (rather than a shared resource quota on a remote server).

In this regard, the lack of performance could be a major productivity killer, my hypothesis is that it is the video processing/manipulation which is taking a long time (and not the model classification). Many companies have tackled this problem space with completely video transcoding chips (such as YouTube), however this generally still incurs long periods of waiting.

[+] eh9|4 years ago|reply

I would imagine it could be slightly easier to manage models deployed on their servers, but Adobe manages to package and ship its tools so they run natively.

My best guess is that the web can be a path of least resistance and that’s why this launched on the web, but as soon as they have available resources, they could package and ship a native app (hopefully one that can take advantage of modern chip ML tech)

[+] mkaic|4 years ago|reply

I'm so glad to see this on the front page -- this website, in conjunction with a video from Two Minute Papers, is what convinced me to teach myself machine learning. I come from a filmmaking background, and have used green screen extensively. It's big, bulky, has to be set up correctly, lit correctly, and is expensive and just generally a pain to work with.

I'm currently working on a fully-automatic version of this based off some excellent research from the University of Washington. It's still in its infancy, but if you'd like to follow my progress, I post occasional updates over on https://nomoregreenscreen.com

This is just the beginning of what's possible in the intersection between DL and creative filmmaking, and I'm really excited to get to be in this field at a time when compute is cheaper than ever and all the information I could want is available for free on the internet!

[+] fouc|4 years ago|reply

Interesting, how long ago did you first see runwayml & decide to teach yourself machine learning?

[+] fxtentacle|4 years ago|reply

Forgive my ignorance, but that looks like someone created a GUI for the TensorFlow Hub, which is a public collection of pretrained AI models.

AI Masking: check

AI Depth estimation: check

AI Flow estimation: check

Flicker artifacts just like the public AI models: check

EDIT: Also, this isn't actually new anymore? A quick check found two very similar startups: unscreen.com vfx.comixify.ai And AI rotoscoping has been part of DaVinci Resolve 17 since Februrary: https://www.blackmagicdesign.com/products/davinciresolve/wha...

[+] endymi0n|4 years ago|reply

None of what Dropbox did was impossible before either (cue famous comment: https://news.ycombinator.com/item?id=9224 ).

Us nerds often forget that possibility and usability are not nearly the same.

Usage happens when non-technical people like editors are able to get their hands onto the technology.

[+] rikroots|4 years ago|reply

Creating a GUI fort TensorFlow (or MediaPipe) is not a bad thing in itself, yes? For one thing, we need more After-Effects-like tools - particularly if they can be browser based and cheap/free to use.

The ML people don't make it easy to work with the output their models produce. I was playing with TensorFlow/MediaPipe last month, to see if I could get them to play nicely with my canvas library. The results were quite promising[1][2][3]. Still, I think making it easier for devs to use these ML models in various ways needs to be prioritized.

CodePen links (all request access to the device camera):

[1] - TensorFlow body-pix model - hide the background in various ways - https://codepen.io/kaliedarik/pen/ZEeoZaP

[2] - MediaPipe Selfie Segmentation model - hide the background in various ways - https://codepen.io/kaliedarik/pen/PopBxBM

[3] - MediaPipe Facemesh - draw on the face in real time - https://codepen.io/kaliedarik/pen/VwpGrVG

[+] adrusi|4 years ago|reply

Does this require you to upload your video assets to their servers over the internet? That seems extremely impractical for 4k or 8k footage. Even with gigabit upload speeds a clip could take several times it's playback duration to transfer.

[+] mikepurvis|4 years ago|reply

OTOH, if it's targeting (at least in part) the tiktok crowd or busy people throwing together a video slideshow for a wedding, you might be dealing with a situation where most of the source video is already in datacenters already, whether FB, GPhotos, etc.

[+] everyone|4 years ago|reply

This does seem like a 'bring the mountain to Muhammad situation' Upload and download many GB of video every time you want to use it, rather than download the program once.

People who do a lot of video editing usually already have decent PCs on which to do it.

[+] ipsum2|4 years ago|reply

My guess is that they don't want to run this on the users computer (even though video editors have fairly beefy machines) because it would be easy to extract out the ML models and use them, destroying their business model.

[+] markdown|4 years ago|reply

According to the pricing page, each video can only be 2mins long. So this appears to be geared towards social media video sharing and marketing.

[+] pininja|4 years ago|reply

I think that is the case. They are applying ML to all the clips too, which is served-side. They also say final renders are server-side.

[+] vultour|4 years ago|reply

Created an account just to be told it doesn't work if you're not using Chrome.

[+] rabuse|4 years ago|reply

Doesn't even work on Chrome for me.

[+] unknown|4 years ago|reply

[deleted]

[+] djstein|4 years ago|reply

commenting to say this was a beautiful, straight to the point landing page. We see lots of landing pages with sections of copy that just don't look great

[+] randallsquared|4 years ago|reply

I have no real idea if this is good, because the landing page has only some images, even though it's supposed to be about video. Show me some video, then! There's a slider gallery showing a series of still images of an editing interface -- why not show it in action?

Likely there's something I'm missing, here.

[+] CharlesW|4 years ago|reply

It is good, but it took me a few minutes to realize it was talking about two different products. (And I did literally lol at the comically-large "Get Started Today →" button.)

[+] rchaud|4 years ago|reply

Most SaaS landing pages sell products that are boring and non-visual. It's hard to get excited about a CRM or an IDE or a Jira clone or CI/CD pipeline tool.

[+] gxqoz|4 years ago|reply

It's conspicuously lacking testimonials. I personally hate testimonials and never have understood their appeal. I don't care what some person I've likely never heard of has to say about this product. I'd rather see what it does.

Relatedly, I've never understood why some books have 4 pages of testimonials in the front. Once you get past 5 of these are you really more likely to buy the book?

[+] dharma1|4 years ago|reply

Many of these ML features exist in regular video editing/grading software like the excellent Davinci Resolve - which is free for the non-studio version and lighting fast.

I can see this could be appealing for people who don’t want to learn or install professional software, I think there is value in that. I’d like to see a client side version of this app, many people have beefy gaming GPUs that could run the models client side

[+] DoctorOW|4 years ago|reply

I've done professional video editing and VFX work. This will not get used among professionals. This machine learning tech is better than what's built into After Effects but that's irrelevant because the time it takes me to fix a mistake my software made is far smaller than the time it takes to learn new software that has a far smaller scope.

You're not going to beat Fusion, Nuke, or even consumer tools like Hitfilm at the VFX game. A better use of this tech would be to take the area you improved in and turn it into a plugin for all or one of these.

[+] bsenftner|4 years ago|reply

And you know the tech geeks in VFX tried this type of thing concurrently with the research, and probably have in-house mature tools already. I know from my time at R&H, back in 2005 there was a ML effort to support compositing and color timing TDs.

[+] bogwog|4 years ago|reply

What is "impossible video"? As someone who doesn't know much about video editing, none of what they're showing looks "impossible" to me. I'm pretty sure I've seen videos where someone added clones of themselves, like the skateboarder sample.

Guess the actual target demographic gets it?

[+] mrspeaker|4 years ago|reply

Yeah, "impossible" phsssh... it's clearly a 180 heel-flip or something like that. THIS is an impossible - https://www.youtube.com/watch?v=GtaJSQO0T8k

[+] joshuaengler|4 years ago|reply

The idea is a bit different. Filming clones usually requires leaving the camera "still" without moving it, then you re-film the same scene and same position multiple times so you can drop the backdrop very easily using very basic cutting or masking.

This is different though. No cutting or masking, the actual A.I. itself is doing it automatically. No filming the same takes with the same camera position 3 times in a row then doing tedious editing that takes hours.

I have my doubts as to how well it would work... But assuming it did then it's a real game-changer in the editing world because it could not only save a ton of time, but actually produce new abilities, like cloning with object crossover, and while the camera is moving around erratically.

[+] unknown|4 years ago|reply

[deleted]

[+] brainless|4 years ago|reply

I think needing a super fast Internet connection is like a huge barrier for adoption. Even in India, say they had their backend servers, expecting everyone doing video to have a 1Gbps connection is not easy. [1]

Also, the demo tells me the target audience is not pro video folks (film/TV) rather individuals. They might find it harder to keep spending so much on shuttling data. The power of ML in video has been clear to many but we need these tools to work offline.

Are there great ML specific chips for PC/laptops? I mean like the ones that Apple keeps talking about? I am not in the domain so I don't know, but I guess GPUs are the best chips for this, is there any reason this software would not work on a beefy RTX 30X0 based device?

Update

1. I mention "even in India" because I keep seeing bandwidth pricing here is still relatively cheap, globally speaking. OK, not cheap by typical Indian household measures but if you are doing pro video then yeah, cheap infrastructure.

[+] sillysaurusx|4 years ago|reply

Why would a 1Gbps connection be required for streaming video? Netflix, YouTube, and Twitch seem to be able to service most of the world.

I haven’t actually read the article, so perhaps they say as much. But if so, that’s surprising.

EDIT: thinking a bit more carefully, the limitation is upload speed, not download speed. In that area, the US has been lagging behind in tech.

Still, I wouldn’t bet against it being viable. I remember the first few months YouTube launched. The video quality was atrocious, the worst on the internet by far — this isn’t revisionism; other platforms tried to compete on quality.

Didn’t matter; youtube won. And now we enjoy 4K steaming.

[+] azinman2|4 years ago|reply

Wow I’m extremely impressed. This is clearly the future of video editing — and deep fakes. How soon until Adobe buys them?

[+] Jeff_Brown|4 years ago|reply

The techie in me is tickled. The citizen in me is filled with dread.

[+] heleninboodler|4 years ago|reply

"How soon until Adobe buys them" was exactly my thought too.

[+] mrspeaker|4 years ago|reply

Is this any better than the current After Effects auto-roto feature? (Real question... is it?!) I thought they were already using ML models to do rotoscoping.

[+] mrspeaker|4 years ago|reply

Does anyone have any resources on how you might make this yourself from scratch using machine learning? Obviously not this polished, but just a really rough-and-dodgy version to learn how it works.

[+] ergot_vacation|4 years ago|reply

Check out the Youtube channel Two Minute Papers if you haven't already. He covers a lot of stuff like this, and much of it comes with Google Collab demo scripts ready to run. Of course they're often poorly tested/maintained, and may take work to finish out into a tool for your specific use case, but you do at least get to touch the actual code and see how it works.

[+] mrspeaker|4 years ago|reply

Reply to self! In this thread the user `mkaic` linked to their own work/study at: https://nomoregreenscreen.com/ ... perfect "getting started" material.

[+] ale42|4 years ago|reply

... "all on the web", says their main page. I'm not sure that this a great selling point... web applications are easy to start with (no installation), but it's very hard to have the same reactivity of a local native application; plus, not everyone can have a 1 Gbps connection...

[+] mackrevinack|4 years ago|reply

and as another commenter pointed out, it only works on chrome and not any of the other browsers so a lot of people are going to have to install something anyway

[+] BuildTheRobots|4 years ago|reply

The automated masking does look impressive, but you can do the same with Davinci Resolve, which charges a one off £250 for a lifetime license with free upgrades.

At $35/month Davinci works out cheaper in less than 10 months...

[+] runawaybottle|4 years ago|reply

Is this stuff not possible with existing editing/compositing tools? Obviously the ease of use here is going to make it mainstream, but is ML changing the game in those other spaces yet?

[+] ergot_vacation|4 years ago|reply

If you don't mind a video that's a bit "Youtube-y", Corridor Crew had a great example of the situation here (including Runway!): https://www.youtube.com/watch?v=fmJ74774RO8. The short answer is that AI isn't mostly making new things possible, so much as it's making old things, like compositing, much MUCH faster. 10x so in some cases. Other videos have shown similar stuff getting gradually integrated into the larger production software.

Machine Learning is also genuinely making new things possible. Two Minute papers has a video here: https://www.youtube.com/watch?v=22Sojtv4gbg demonstrating a model that trains on driving footage, then adjusts video game footage to look almost indistinguishable from a real world shot (and in real time!). This is technically a video game application, but it's not hard to imagine how this could be used in a video or movie context.

The extent to which this site/software actually captures these broader trends is probably minimal, but they're definitely out there. I think we're on the edge of another huge shift, like the one where CG became more practical and down-to-earth so it could be used anywhere. A lot of the same stuff will be done, but by a couple of people rather than an army of editors.

[+] orangegreen|4 years ago|reply

The auto-rotoscoping feature seems very useful if it works well. Auto-rotoscoping in After Effects can be extremely finicky and usually requires frame-by-frame touch ups. This would be very useful for people without green screens.

[+] Daub|4 years ago|reply

These foreground/background separations look magic. However, they also look cherry picked. The first skater boy is shot with a fixed camera and a figure strongly contested against the bg. The woman walking against a misty bg is also a soft target for separation. The figure removal in the car park would have been easy enough even ten years ago. The depth maps look more promising. They seem high enough quality for a decent fog pass.

Not sure about the cloud based aspect. I am teaching video online right now. Bandwidth issues is making many students drop the course.

[+] bayindirh|4 years ago|reply

While this is impressive, the foundation of the tech was already there for quite some time.

Any modern mirrorless camera is doing the same analysis on the scene with object detection and tracking. So, they're just doing it on a video stream without phase/depth data.

Sony, Nikon, Canon, Panasonic and Fuji have similar technologies built in their cameras, but having this on your desktop for using with your videos is a nice step forward.

[+] jcun4128|4 years ago|reply

Is there tech yet where you can type out a transcript and it'll make a movie? That would be pretty wild.

[+] praveentiwari|4 years ago|reply

I use screenflow today but this looks much more powerful and innovative. Will give a try for sure.

78 comments