WingNews

jonas21|1 year ago

If you want to train a model to have a general understanding of the physical world, one way is to show it videos and ask it to predict what comes next, and then evaluate it on how close it was to what actually came next.

To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.

Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.

manquer|1 year ago

It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.

All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.

This is not bad, or make it inherently dumb, a major component of human intelligence is built on similar strategies. I couldn’t tell what grammatical rules are broken in text or what physical rules in a photograph but can tell it is wrong using the same methods .

Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .

This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.

Reasoning is essential for higher order functions and transformers is not the path for that

terhechte|1 year ago

Back when computers took up a whole room, you'd also have asked: "but what exactly is this useful for? B-Roll some simple calculations that anybody can do with a piece of paper and a pen."?

Think 5-10 years into the future, this is a stepping stone

alectroem|1 year ago

That's comparing apples to oranges though isn't it? Generating videos is the output of the technology, not the tech itself. It would be like someone asking "this computer that takes up a whole room printed out ascii art, what is this useful for?"

code_for_monkey|1 year ago

this is kind of an unfair comparison. Whats the endpoint of generating AI videos? What can this do that is useful, contributes something to society, has artistic value, etc etc. We can make educational videos with a script but its also pretty easy for motivated parties to do that already, and its getting easier as cameras get better and smaller. I think asking "whats the point of this" is at least fair.

carlosjobim|1 year ago

They were calculating missile trajectories, everybody understood what they were useful for.

drusepth|1 year ago

We're preparing to use video generation (specifically image+text => video so we can also include an initial screenshot of the current game state for style control) for generating in-game cutscenes at our video game studio. Specifically, we're generating them at play-time in a sandbox-like game where the game plays differently each time, and therefore we don't want to prerecord any cutscenes.

moritonal|1 year ago

Okay, so is the aim to run this locally on a client's computer or served from a cloud? How does the math work out where it's not just easier at that point to render it in game?

notatoad|1 year ago

in it's current state, it's already useful for b-roll, video backgrounds for websites, and any other sort of "generic" application where the point of the shot is just to establish mood and fill time.

but more than anything it's useful as a stepping stone to more full-featured video generation that can maintain characters and story across multiple scenes. it seems clear that at some point tools like this will be able to generate full videos, not just shots.

wnolens|1 year ago

TV commercials / youtube ads. You don't need a video team anymore to make an ad.

nope96|1 year ago

This is a first step towards "the holodeck". You describe a scene and it exists. Imagine you could jump in and interact with it. That seems like something that could happen in 10-20 years.

mbil|1 year ago

You and your friends gather around the TV to watch a video about the time that you all traveled abroad and met a mysterious stranger. In the film, you witness each other take incredible risks, have intimate private conversations, and change in profound ways. Of course none of it actually happened; your voices and likenesses were fed into the movie generator. And did I mention in the film you’re driving expensive cars and wearing designer clothes?

Philpax|1 year ago

Are they that limited? It's a machine that can make videos from user input: it can ostensibly be used wherever you need video, including for creative, technical and professional applications.

Now, it may not be the best fit for those yet due to its limitations, but you've gotta walk before you can run: compare Stable Diffusion 1.x to FLUX.1 with ControlNet to see where quality and controllability could head in the future.

unknown|1 year ago

[deleted]

picafrost|1 year ago

I have observed some musicians creating their own music videos with tools like this.

aenvoker|1 year ago

This silly music video was put together by one person in about 10 hours.

https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1...

Another more serious music video also made entirely by one person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't know how long it took though.

hnuser123456|1 year ago

Because it's pretty cool to be able to imagine any kind of scene in your head, put it into words, then see it be made into a video file that you can actually see and share and refine.

carlosjobim|1 year ago

Use your imagination.

yieldcrv|1 year ago

this is perfect for the landing page of any website I make

my templates all are waiting for stock videos to be added looping in the background

you have no idea how cool I am with the lack of copyright protections afforded to these videos I will generate, I'm making my money other ways

krunck|1 year ago

Streaming services where there is no end to new content that matches your viewing patterns.

code_for_monkey|1 year ago

this sounds awful haha

chefandy|1 year ago

It's got a lot of potential as a way for google to get paid for other people's skills and hard work instead of the people that made all of that "data".

ElemenoPicuares|1 year ago

It’s kind of hilarious that anybody considers this “democratizing” creating media. How many people that need a video clip are going to be capable of running an open version of this themselves? The wonky “open” models aren’t even close. How much do you think these services are going to cost once the introductory period financed by race-to-the-bottom money stops? OpenAI already charges $200/mo if you want to be guaranteed more than 30-60 minutes of Advanced Voice. The introductory period exists solely to get people engaged enough to push through blatantly stealing millions of artists creative output so they can have a beautiful tool they sell to Hollywood for a whole lot of money that’s still less than traditional vfx, and to m everyone gets to dink around in the useless free models or too-expensive-for-most prosumer tools and people with expensive video card arrays or the functional equivalent will still be niche tinkering hobbyists with inferior tooling and models and the skilled commercial artists still employed are being paid shit because of market forces. Great job SV. Making the world a better place.

tucnak|1 year ago

You really think making videos with computers is not useful? Is this a joke?

(no title)

discuss