Make-A-Video: AI system that generates videos from text

[+] slhomme|3 years ago|reply

As an owner of a Video Production studio, this kind of tech is blowing my mind and makes me equally excited and scared. I can see how we could incorporate such tools in our workflows, and at the same time I'm worried it'll be used to spam the internet with thousands and thousands of souless generated videos, making it even harder to look through the noise.

A fun related experiment, I thought it was fun to see what kind of movies AI would generate, so I created a "This Movie Does Not Exist" website[1] that auto generates fake movies (movie posters + synopsis). It basically uses GPT-3 to generate some story plots, and then uses that as a prompt (with in-between steps) for Stable Diffusion. Results may vary, but it definitely surprises sometimes with movies that look and sound amazing!

[1] This Movie Does Not Exist: https://thismoviedoesnotexist.org/

[+] skocznymroczny|3 years ago|reply

Reminds me of South Park when Cartman was pretending to be a robot and was made to invent movie prompts

"Adam Sandler is like, in love with some girl, but then it turns out that the girl is actually a Golden Retriever. Or something.""

[+] maxov|3 years ago|reply

I love this! But after trying it a few times I got this result :). So fascinating.

https://thismoviedoesnotexist.org/movie/the-terminator

Brings up the age-old question of how much the learning in these models is just memorization. Though in cases like these it’s hard to tell.

[+] hiidrew|3 years ago|reply

I think some refer to this as the dead internet theory, e.g. AI content creation becomes the majority of media on the internet instead of humans posting (may be wrong in my explanation but think that's the premise).

It's scary to think about it but seems plausible—like if someone can make an app with Tiktok-like ubiquity of only AI content. Although to your point I imagine there will be so much nonsensical noise that curating will become a useful skill, it is today but even more so.

[+] moron4hire|3 years ago|reply

>> spam the internet with thousands and thousands of souless generated videos

Unfortunately, that's already happening.

https://www.youtube.com/watch?v=w7oiHtYCo0w

From what I can see, YouTube has done quite a bit of work to cleanup YouTube Kids, but it's kind of an arms race.

There's this worrying issue in AI ethics discussions where most people seem to assume the problems and dangers of AI are still off in the future, that as long as we don't have the malicious AGI of sci-fi stories, then AI and "lesser" algorithmically generated content isn't harming society.

I think that's not true at all. I think we've seen massive damage to social structures thanks to algorithmic feeds and generated content, already, for years now. I don't think, just because they aren't necessarily neural-network-based, doesn't make them something to not worry about.

So I don't see AI as a particularly, different, worrisome problem. It's an extension of an already existing, worrisome problem that most people have ignored beyond occasionally complaining about election results.

[+] agentwiggles|3 years ago|reply

This one is too funny: https://thismoviedoesnotexist.org/movie/the-virginity-pact

[+] twobitshifter|3 years ago|reply

Saw this clip today in response to a tweet by musk saying a cyber truck can act as a boat. I thought either this was made by an AI or it will be soon.

https://twitter.com/dvorahfr/status/1575508907593711618?s=46...

[+] barbariangrunge|3 years ago|reply

I know some extremely hard working independent filmmakers who struggle so hard to get noticed. After this tech goes mainstream in 5 years and gets really good, I don’t know what they’re going to do

[+] extragood|3 years ago|reply

Sleeping Beauty actually looks really good: https://thismoviedoesnotexist.org/movie/sleeping-beauty

Edit:

Worst (best?) tone clash: https://thismoviedoesnotexist.org/movie/stalked-by-a-friend

[+] usefulcat|3 years ago|reply

> I'm worried it'll be used to spam the internet with thousands and thousands of souless generated videos

I agree, and I think when that happens, it will tend to increase the value of curation. High quality curation that is, probably done mostly by hand, as opposed to the at-best-mediocre automated curation that is commonly used.

It could be bad for things like YouTube, for example. I think there will be an arms race between generated video content one one side and automated curation on the other. I mean, you can still leverage viewer choices for curation (looking at what people are watching a lot of), but that is just shifting the burden of curation to users. Few people will be willing to sift through dozens of cheaply generated crap videos to find something they actually want to watch.

[+] cutups|3 years ago|reply

I was thinking about something like this site, but also taking some randomized existing plot outlines to generate specific stills from each part of the plot. Might require isolating character archetypes too?

Great work with this as is!

[+] pmarreck|3 years ago|reply

https://thismoviedoesnotexist.org/movie/the-yetis-last-stand

The description text does not convert sentences with carriage returns (or probably, newlines) into separate div's or whatever html element you'd prefer, FYI! Otherwise, very cool!

[+] WheelsAtLarge|3 years ago|reply

Cool toy. One of the most useful side effects of AI right now is idea generation. Market this as an idea generator for movies and such and people will eat it up. Try posting it on the entertainment focused area of Twitter and people will go nuts for it.

[+] klondike_klive|3 years ago|reply

These prompts are way better than the drivel I see on Amazon Prime. Half their movie descriptions don't even tell you anything about the film, they seem to be just a random paragraph from the pitch document.

[+] gcanyon|3 years ago|reply

"Soulless" -- they'll be low quality, both in rendering and in plot/acting, but they'll be anything but soulless. Each will be a labor of love of someone with a dream.

[+] sgrove|3 years ago|reply

You should add a “tweet this movie” button that pre-populates the image and the title! I immediately wanted to share one of the funny suggestions.

[+] sixQuarks|3 years ago|reply

For those who are scared about this technology, it’s good to look at what AI has done to Chess.

The best chess seems to be when AI is used along with humans. I think image and video AI will best be exploited when human input is also taken into account.

There is still something special about human creativity, I think AI will just be another tool to expand that. At least, in the short term I would say 10 years perhaps. AI will probably one day take over all aspects of creativity and humans won’t be able to contribute.

[+] hugozap|3 years ago|reply

Is the generation happening in real time? I'm curious about the costs of running something like this.

[+] _tom_|3 years ago|reply

First one I got was "the time traveler's wife" which does exist.

[+] samstave|3 years ago|reply

Ha - getting app too busy errors so can’t see your site…

[+] picsao|3 years ago|reply

[deleted]

[+] j_m_b|3 years ago|reply

What I want to see is a model that can generate 3D models for use in applications such as Blender. It would provide a good starting point for someone with talent to make beautiful. Or just save people like me time for making games.

[+] tmjdev|3 years ago|reply

This looks like the video equivalent of Dall-E 1. Hard to believe how far we've come so quickly.

The paper talks about "pseudo 3D attention layers" that are used in place of temporal attention layers for each dimension due to memory consumption. It seems like AI research is vastly outpacing GPU development.

[+] londons_explore|3 years ago|reply

Indeed - it's not hard from a research point of view - it's hard from a compute perspective because adding one more dimension requires hundreds of times more compute.

Even then, these videos are only like 50 frames long - and a real movie you would want to be hundreds of thousands of frames long.

[+] tiborsaas|3 years ago|reply

Hardware was probably always lagging behind cutting edge research, just consider video games, they pushed hardware limitations very hard since Pong.

It's a good thing to be fair, forcing research teams to optimize their projects is beneficial and creates a competition for limited resources. This gets a bit skewed when we consider a university research team vs. a MANGA type company, but the team behind Stable diffusion proved that innovation can come from unexpected places.

[+] agitator|3 years ago|reply

What's mind blowing is that you can extrapolate where this is going to go. Eventually, you will be able to generate full movie scenes from descriptions.

What's interesting to me is how this is so similar to human imagination. Give me a description and I will fabricate the visuals in my mind. Some aspects will be detailed, others will be vague, or unimportant. Crazy to see how fast AI is progressing. Machines are approaching the ability to interpret and visualize text in the same way humans can.

This also fascinates me as a form of compression. You can transmit concepts and descriptions without transmitting pixel data, and the visuals can be generated onsite. Wonder if there is some practical application for this.

[+] dzink|3 years ago|reply

Whenever there is an explosion of content, curation and search become important. Meta has many of the products people use to show off their taste, so having more content to curate and share is good for their ecosystem (until people stop consuming as much because they know they can produce even better with their own imagination). This may be good for Google - the more content there is, the more you need search to find it.

The downsides: there will be less money out there for creators, because that becomes a commodity. You will be able to make money if you are known for quality content polishing, editing and generally bringing that last 5-10% of generated content to look perfect (all the way until automated tools are trained to do that as well). AI will automate and improve most white-collar jobs. Instead of generators, everyone will become curators, as taste will be more important (until that's trained into the system as well).

For deeper levels of consequence we have to look at history: how did the world change when people finally got paper after most writing was done on animal skins (more got to write, the richest or most powerful didn't have the only say), or water piped to your house after you had to carry buckets and dig wells (It freed up time for everyone for more interesting tasks). Now GPUs are going to be the new paper and the new PVC. Yes, software has been eating the world for a while, but you won't be able to brainstorm without AI generating the first pass.

[+] pesenti|3 years ago|reply

Research paper: https://makeavideo.studio/Make-A-Video.pdf

Examples: https://make-a-video.github.io/

Demo site: https://makeavideo.studio/

I am told live demo and open model are on the way.

[+] fasteddie31003|3 years ago|reply

I'm rooting for this tech. Hopefully this will get modern movies out of their low risk reboot loop since it will be cheaper to make a movie that have new story lines that are commercially untested. I'd be happy to watch a movie that doesn't look AAA, but has compelling writing and makes me think. Or maybe I'll just stick to books.

[+] Xelynega|3 years ago|reply

I don't think modern movies are stuck in a "low risk reboot loop" because of the cost to produce, it's because of the potential profit.

Why spend money on a film with new IP and ideas that you're not sure will be popular when the data science team has already worked with marketing to figure out exactly what movie will sell well?

Good luck finding your movie with compelling and thought provoking writing in the big pile of movies produced by comittee to show up above yours in discovery algorithms.

[+] RupertEisenhart|3 years ago|reply

Have you tried MUBI? It takes a lot of the hassle out of finding quality arthouse films, there is a lot of good stuff on there.

Though I must admit that if I didn't have friends holding my hand through the minefield of modern cinema, I would also just stick to books.

[+] adamsmith143|3 years ago|reply

Not sure we're quite there yet. A real movie needs a lot of dialogue, speech, sound effects, music, etc. Even the best LLM's don't do really coherent storytelling yet and a script for a movie is just the absolute barebones.

[+] simonw|3 years ago|reply

I was pretty surprised to see that the WebVid-10M Dataset used as part of this training - https://m-bain.github.io/webvid-dataset/ - consists entirely of video preview clips scraped from Shutterstock!

I built a quick search engine over that data:

https://webvid.datasette.io/webvid/videos

Wrote more about that here: https://simonwillison.net/2022/Sep/29/webvid/

[+] nicoslepicos|3 years ago|reply

Really cool seeing this & excited to really start playing with it. 2023 is going to be funky!

It's pretty wild to see how quickly the space of Generative AI Media is coming along.

I started a newsletter on the topic, called The Art of Intelligence (GPT-3 came up with the name) with the first post going out last Friday on the topic of how far are we from AI generated videos, and simulated worlds like the Holodeck given the rapid progress of these visual A.I. Thought y'all might find it interesting: https://artofintelligence.substack.com/p/dall-e-stable-diffu...

This type of progress also reminds me of a really lovely publication from 2017 in Distill.pub, on the topic of these A.I. enabled creation tools - I think y'all would enjoy seeing what folks were thinking even then: https://distill.pub/2017/aia/

[+] mlsu|3 years ago|reply

Clearly, attention really is all you need.

Are GPU vendors (well, gpu vendor, as far as I can tell) focusing heavily on increasing VRAM? My understanding is that transformers are pretty quick to train, but have significant memory costs.

When they say that video is infeasible with memory... does that mean that if we had enough memory (128? 256? gb) we would be able to realistically train such networks with temporal attention?

This is insanely exciting. It looks like we are limited, at this point, only by compute.

[+] indigodaddy|3 years ago|reply

So how long before we can “Hey music-stable-diffusion-service,” play me “relaxing baroque violin and flute music in the style of game of thrones” ?

[+] mjr00|3 years ago|reply

A long time. Much like self-driving cars, AI can take you 90% of the way there, but the last 10% is the difference between music that sort-of-seems-competent, and music that people will actually listen to and enjoy.

[+] anonymouse008|3 years ago|reply

I think we all can view a video of 'nails on a chalkboard' before we hear audio of nails on a chalkboard.

For some reason, unacceptable uncanny sounds is a much wider valley than unacceptable videos/pictures. The hand holding is uncanny in the family video, but I'm fine watching it for a second - it doesn't cause pain the way that same error would in music.

[+] ragazzina|3 years ago|reply

I've found this is my biggest issue with music streaming services. Often I just want another song similar to the one I am listening to, but no "generate radio from this track" comes close.

[+] echelon|3 years ago|reply

My start-up is working on this! We'll be launching a web and downloadable version soon.

Eg. https://www.youtube.com/watch?v=r_0JjYUe5jo --> https://vocaroo.com/1hgjjnVNqWjk

We're also working on film generation.

[+] anigbrowl|3 years ago|reply

That's so last week: https://mubert.com/ (not affiliated)

[+] mouzogu|3 years ago|reply

Wonder if all these things will bring about a kind of cambrian explosion of creativity.

Imagine a future of Prompt Wizards, who are able to coax the AI to generate things in a very specific way.

Although we would probably need a much greater level of human curation. The way algorithms curate on youtube and spotify just doesn't really hit the spot.

Perhaps stability and Dall-e already kind of showed that the value is not so much in the physical act of creating, but more-so in the ability to express something that the AI can represent and which can connect with you.

[+] permo-w|3 years ago|reply

I really don’t understand the fear people have about these things. have I missed something and everyone else was placing huge value in out of context videos and pictures?

if you read “France declares war on Canada”, you’re not gonna believe it unless it’s coming from an extremely reputable source. so why would you trust a random unsourced video?

the absolute worst thing that’s gonna happen is that video-based social media is gonna be flooded with low (or even high) quality AI videos. and I ask you: who the fuck cares? are these places doing wonders for society as it is? what’s a bit more rubbish amongst all the rest?

I can think of many, many more upsides than down

[+] nemo44x|3 years ago|reply

When will one of these projects finally name themselves “Infinite Jest”? I’m guessing when the perfect pornography can be generated for you immediately, it will have an entertainment-to-death effect on a number of people. An Infinite Jest for conspiracy theories; one for shit posts, etc.

[+] danso|3 years ago|reply

It's funny that in terms of popular culture, we've all but accepted that Star Trek's Holodeck-type fantasy creation is something that will/should exist in a "proper" future. Yet all the intermediary technology to get there -- AI-generated visuals would be among the first such steps -- still makes most of us feel a little uncomfortable at first glance.

[+] Datenstrom|3 years ago|reply

The

> A golden retriever eating ice cream on a beautiful tropical beach at sunset, high resolution

example is terrifying.

[+] scifibestfi|3 years ago|reply

"Our research takes the following steps to reduce the creation of harmful, biased, or misleading content."

"Our goal is to eventually make this technology available to the public, but for now we will continue to analyze, test, and trial Make-A-Video to ensure that each step of release is safe and intentional."

Are they really going to do a replay of OpenAI and Stable Diffusion? Deja vu coming soon.

[+] dbieber|3 years ago|reply

Very exciting to see the field progressing so quickly. I wonder how quickly it's going to move forward from here. Will we be generating coherent audio to accompany these videos soon? Will we have multi-scene videos in the next year? Ones with coherent plot? Can we get there just by scaling up, or are other advances needed? Excited to see what comes next!

[+] forgingahead|3 years ago|reply

Amazing. And lucidrains is on the case as well: https://github.com/lucidrains/make-a-video-pytorch

[+] psychomugs|3 years ago|reply

I'm very interested in what will come out of this new (sub)medium. By virtue of video being a collaborative medium, I never feel like I'm getting a message from a singular consciousness like I do from less resource-intensive mediums like books (I know that book editors exist, but the medium has less filters to pass through compared to large products like movies). I could see this substantively lowering the barrier of entry for video and enabling a lot of new stories to be told.

384 comments