GAN Theft Auto [video]

[+] ShamelessC|4 years ago|reply

Great work! Hacker News still seems to have a deeply skeptical culture with regard to machine learning - not sure why. There's always someone saying it's "not novel" and it's "just doing x".

Overfitting is a known issues in machine learning, people. If you still think all neural networks are doing is memorizing the dataset completely in the year 2021 - you might want to revisit the topic. It is one of the first concerns anyone training a deep model will have and to assume this model is overfit _without_ providing specific examples is arguing in bad faith.

Sentdex has shown his GAN is able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car.

He also showed weak points where the model is incapable of handling some situations and even did the impossible task of "splitting a car in two" to try and solve a head-on collision. Even though this is a failure case; it should at least provide you with some intuition that the GAN isn't just spitting out frames memorized from the dataset because that never happens in the dataset.

You will need to apply a little more rigor before outright dismissing these weights as merely overfit.

@sentdex Have you considered a guided diffusion approach now that that's all the rage? It's all rather new still but I believe it could be applied to these concepts as well.

[+] sentdex|4 years ago|reply

Heh, yeah, tough crowd I guess. The full code, models, and videos are all released and people are still skeptical.

I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them. Drives me nuts. Not sure why all the hate when you could just see for yourself. I'd welcome someone who can actually prove the model just "memorized" every combo possible and didn't do any generalization. I imagine the original GameGAN researchers from NVIDIA would be interested too.

Interesting @ guided diffusion, not aware of its existence til now. We've had our heads down for a while. Will look into it, thanks!

[+] rasz|4 years ago|reply

One of the main problems with ML/NN is it often works like magic, aka the trick works as long as audience doesnt know the secret behind it. Its fascinating to gullible audience, mundane bordering on boring to practitioners.

My Tiger repelling rock^^^^^^leopard detection model works great on all animal pictures ... until you feed it a sofa https://web.archive.org/web/20150703094328/http://rocknrolln...

>able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car

id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.

> impossible task of "splitting a car in two" to try and solve a head-on collision.

it played back both learned versions at once, like reporting confidence of round thing being 50% ball and 50% orange.

[+] andrepd|4 years ago|reply

> Hacker News still seems to have a deeply skeptical culture with regard to machine learning

Is... that a bad thing? Skepticism is good. When it's about something as hyped as "deep learning", even more so.

[+] unknown|4 years ago|reply

[deleted]

[+] jwilber|4 years ago|reply

I like your YouTube videos in general and think this content is a great benefit to the community.

I wouldn’t take the few negative comments personally - I’ve seen many GAN architectures that heavily overfit (including my own bobross pix2pix) get a lot of praise, while ‘less violating’ models (like yours) get more skepticism. Skepticism isn’t bad! But I’d wager in your case it may be because you’re a YouTuber, and other ml YouTubers are notorious for ripping off content (eg Siraj).

Not really related to this, but I’d personally love to see the difference in training times it would take an RL agent to adequately learn to drive a car in gta versus adequately flying a helicopter.

[+] ALittleLight|4 years ago|reply

Because you're using the word "your" I just feel the need to clarify that I didn't create this. I just saw it on YouTube and thought it was neat.

[+] senkora|4 years ago|reply

Someone did a similar project with the exact same name for an ML art project at CMU a few years ago.

https://m.youtube.com/watch?v=eP5hHKne_gE&feature=youtu.be

Full list of projects: https://sites.google.com/site/artml2018/showcase/final-proje...

[+] sentdex|4 years ago|reply

Jeez, scared me. Same name yep, totally different project. That project is pix2pix. That is not a GAN-based game engine that you play within.

[+] emptyparadise|4 years ago|reply

One throwaway line about GAN operating systems now made me want to see a shell GAN. Keypresses as inputs, 80x24 terminal screens as outputs. Could a neural network dream of Unix?

[+] reasonabl_human|4 years ago|reply

This exists via recent NLP models, I’ll see if I can dig up a link…

Edit: https://www.reddit.com/r/linux/comments/mtnld7/programmer_cr...

[+] sentdex|4 years ago|reply

I don't see why not. Might be something fun to try tbh.

[+] whalesalad|4 years ago|reply

GPUs: am I a joke to you? Instead of using them to render polygons, let’s use them to train neural networks that produce models that make them unnecessary. I’m oversimplifying - but pretty wild nonetheless.

[+] Sharlin|4 years ago|reply

Wait, you mean that "GPU" doesn't mean "GAN Processing Unit"? ;)

[+] slver|4 years ago|reply

Well, neural networks run (fastest) on GPUs.

[+] nitrogen|4 years ago|reply

Something I'd like to see is a visualization of subsets of the network's internal state that correlate with simple quantities like compass direction, velocity, position, etc. It'd be really fascinating to see where in the model these things are being learned, whether they are concentrated in a small area or spread out, and whether this is somewhat consistent across different iterations of the model.

[+] ludwigschubert|4 years ago|reply

Me too! In a much simpler setting a former colleague of mine, Jacob Hilton, tried such an exploration for the vision part of a OpenAI CoinRun model. It’s the first part of this paper: https://distill.pub/2020/understanding-rl-vision/

[+] philipswood|4 years ago|reply

The GitHub repo is here:

https://github.com/sentdex/GANTheftAuto/

[+] okamiueru|4 years ago|reply

Can someone explain a bit more on the long term applicability, or maybe other use cases that might be easier to appreciate?

The reason why I ask is that it seems very challenging to generate the training data for such systems. Could someone explain how this can go further than to just replicating X? So, if assuming some creative freedom, could you give an idea of what the long term application of this would be?

NB: please take my questions at face value without thinking I'm implying this isn't cool for what it is. I'm all for people having fun. I'm all for projects not needing to tackle some grander issue.

[+] tiborsaas|4 years ago|reply

In the future we might have a fourth common media format besides pictures, videos and audio: GAN records.

[+] 4dahalibut|4 years ago|reply

Hey sentdex this is absolutely awesome! Playing with exotic target types like generating games is IMO where the fun is in ml :)

Do you see yourself taking this train of play further?

[+] sentdex|4 years ago|reply

We'd like to try some further GTA stuff, as well as some IRL stuff. Have seen some recent IRL GAN stuff, and it looks super interesting.

There's just something about AI-based environments that is particularly intriguing!

[+] TinkersW|4 years ago|reply

Looks interesting, if very far from practical-- too bad it requires a "DGX" station to train

It seems to flicker/fade things in alot, like the random poles that keep appearing and disappearing, it seems like there is not enough focus on temporal consistency or something?

[+] bruce343434|4 years ago|reply

If you see the source output image before it was upscaled you noticed the resolution is too low and thin objects "fall between" the pixels. The upscaler then interprets it as air, it seems.

[+] someperson|4 years ago|reply

It should be possible to take arbitrary video training data (whether from a game or real-life) and automatically reconstruct the 3D models of all vehicles in the scene (and the skybox) and "play back" the scene in a video game engine.

This is the direction virtual and augmented reality is headed (Facebook Codec Avatar, and their room reconstruction technology).

[+] junon|4 years ago|reply

This is incredible. Took me a minute to realize this isn't an image transform of some kind.

Really well done.

[+] HerrmannM|4 years ago|reply

Great work! I'm curious about what could be achieved in this space in the future.

I'm curious about why you cannot share the GTA5 mod and collection script? I'm curious about that part too -- obtaining good data is always hard.

Cheers and all the best!

[+] dividuum|4 years ago|reply

Impressive. Makes you wonder if at some point in the future there isn't a game engine any more but tons of training material and you play in a generated dream.

[+] jsiepkes|4 years ago|reply

Certainly impressive. And sure, maybe in a distant future. Though I think this is like one of those things where creating a working prototype that is 75% complete is the "easy" part. The other 25% (which you need for an actual working product) will take forever. Like self driving cards, nuclear fusion, etc.

[+] slver|4 years ago|reply

Maybe not entirely, because just like a dream, the rules of a neural network tend to drift and be somewhat fuzzy.

Unless it's a high-concept game whose very goal is offering you a dream environment.

But I do believe neural networks will get into everything. They're the last missing piece of our compute model.

[+] darepublic|4 years ago|reply

Cool stuff. Looking forward to more realistic NPCs and player decision driven stories in an open world sandbox

[+] boyadjian|4 years ago|reply

WOW, this is awesome. The video gives the impression we are dreaming

[+] black_puppydog|4 years ago|reply

Woah dude!

sudo python3 inference.py?

Really? :D

[+] unknown|4 years ago|reply

[deleted]

[+] Randomoneh|4 years ago|reply

I fail to see novelty here. What's the size difference between the model and and all of the 64x32 image training data? If the difference is not significant, you're basically almost just scrubbing a video, right?

[+] sentdex|4 years ago|reply

The GAN model is the game environment. You're playing a neural network. The novelty is no game engine, no rules, just learned how to represent the game and you can play it.

96 comments