Great work! Hacker News still seems to have a deeply skeptical culture with regard to machine learning - not sure why. There's always someone saying it's "not novel" and it's "just doing x".
Overfitting is a known issues in machine learning, people. If you still think all neural networks are doing is memorizing the dataset completely in the year 2021 - you might want to revisit the topic. It is one of the first concerns anyone training a deep model will have and to assume this model is overfit _without_ providing specific examples is arguing in bad faith.
Sentdex has shown his GAN is able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car.
He also showed weak points where the model is incapable of handling some situations and even did the impossible task of "splitting a car in two" to try and solve a head-on collision. Even though this is a failure case; it should at least provide you with some intuition that the GAN isn't just spitting out frames memorized from the dataset because that never happens in the dataset.
You will need to apply a little more rigor before outright dismissing these weights as merely overfit.
@sentdex Have you considered a guided diffusion approach now that that's all the rage? It's all rather new still but I believe it could be applied to these concepts as well.
Heh, yeah, tough crowd I guess. The full code, models, and videos are all released and people are still skeptical.
I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them. Drives me nuts. Not sure why all the hate when you could just see for yourself. I'd welcome someone who can actually prove the model just "memorized" every combo possible and didn't do any generalization. I imagine the original GameGAN researchers from NVIDIA would be interested too.
Interesting @ guided diffusion, not aware of its existence til now. We've had our heads down for a while. Will look into it, thanks!
One of the main problems with ML/NN is it often works like magic, aka the trick works as long as audience doesnt know the secret behind it. Its fascinating to gullible audience, mundane bordering on boring to practitioners.
>able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car
id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.
> impossible task of "splitting a car in two" to try and solve a head-on collision.
it played back both learned versions at once, like reporting confidence of round thing being 50% ball and 50% orange.
I like your YouTube videos in general and think this content is a great benefit to the community.
I wouldn’t take the few negative comments personally - I’ve seen many GAN architectures that heavily overfit (including my own bobross pix2pix) get a lot of praise, while ‘less violating’ models (like yours) get more skepticism. Skepticism isn’t bad! But I’d wager in your case it may be because you’re a YouTuber, and other ml YouTubers are notorious for ripping off content (eg Siraj).
Not really related to this, but I’d personally love to see the difference in training times it would take an RL agent to adequately learn to drive a car in gta versus adequately flying a helicopter.
One throwaway line about GAN operating systems now made me want to see a shell GAN. Keypresses as inputs, 80x24 terminal screens as outputs. Could a neural network dream of Unix?
GPUs: am I a joke to you? Instead of using them to render polygons, let’s use them to train neural networks that produce models that make them unnecessary. I’m oversimplifying - but pretty wild nonetheless.
Something I'd like to see is a visualization of subsets of the network's internal state that correlate with simple quantities like compass direction, velocity, position, etc. It'd be really fascinating to see where in the model these things are being learned, whether they are concentrated in a small area or spread out, and whether this is somewhat consistent across different iterations of the model.
Me too! In a much simpler setting a former colleague of mine, Jacob Hilton, tried such an exploration for the vision part of a OpenAI CoinRun model. It’s the first part of this paper: https://distill.pub/2020/understanding-rl-vision/
Can someone explain a bit more on the long term applicability, or maybe other use cases that might be easier to appreciate?
The reason why I ask is that it seems very challenging to generate the training data for such systems. Could someone explain how this can go further than to just replicating X? So, if assuming some creative freedom, could you give an idea of what the long term application of this would be?
NB: please take my questions at face value without thinking I'm implying this isn't cool for what it is. I'm all for people having fun. I'm all for projects not needing to tackle some grander issue.
Looks interesting, if very far from practical-- too bad it requires a "DGX" station to train
It seems to flicker/fade things in alot, like the random poles that keep appearing and disappearing, it seems like there is not enough focus on temporal consistency or something?
If you see the source output image before it was upscaled you noticed the resolution is too low and thin objects "fall between" the pixels. The upscaler then interprets it as air, it seems.
It should be possible to take arbitrary video training data (whether from a game or real-life) and automatically reconstruct the 3D models of all vehicles in the scene (and the skybox) and "play back" the scene in a video game engine.
This is the direction virtual and augmented reality is headed (Facebook Codec Avatar, and their room reconstruction technology).
Impressive. Makes you wonder if at some point in the future there isn't a game engine any more but tons of training material and you play in a generated dream.
Certainly impressive. And sure, maybe in a distant future. Though I think this is like one of those things where creating a working prototype that is 75% complete is the "easy" part. The other 25% (which you need for an actual working product) will take forever. Like self driving cards, nuclear fusion, etc.
I fail to see novelty here. What's the size difference between the model and and all of the 64x32 image training data? If the difference is not significant, you're basically almost just scrubbing a video, right?
The GAN model is the game environment. You're playing a neural network. The novelty is no game engine, no rules, just learned how to represent the game and you can play it.
[+] [-] ShamelessC|4 years ago|reply
Overfitting is a known issues in machine learning, people. If you still think all neural networks are doing is memorizing the dataset completely in the year 2021 - you might want to revisit the topic. It is one of the first concerns anyone training a deep model will have and to assume this model is overfit _without_ providing specific examples is arguing in bad faith.
Sentdex has shown his GAN is able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car.
He also showed weak points where the model is incapable of handling some situations and even did the impossible task of "splitting a car in two" to try and solve a head-on collision. Even though this is a failure case; it should at least provide you with some intuition that the GAN isn't just spitting out frames memorized from the dataset because that never happens in the dataset.
You will need to apply a little more rigor before outright dismissing these weights as merely overfit.
@sentdex Have you considered a guided diffusion approach now that that's all the rage? It's all rather new still but I believe it could be applied to these concepts as well.
[+] [-] sentdex|4 years ago|reply
I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them. Drives me nuts. Not sure why all the hate when you could just see for yourself. I'd welcome someone who can actually prove the model just "memorized" every combo possible and didn't do any generalization. I imagine the original GameGAN researchers from NVIDIA would be interested too.
Interesting @ guided diffusion, not aware of its existence til now. We've had our heads down for a while. Will look into it, thanks!
[+] [-] rasz|4 years ago|reply
My Tiger repelling rock^^^^^^leopard detection model works great on all animal pictures ... until you feed it a sofa https://web.archive.org/web/20150703094328/http://rocknrolln...
>able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car
id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.
> impossible task of "splitting a car in two" to try and solve a head-on collision.
it played back both learned versions at once, like reporting confidence of round thing being 50% ball and 50% orange.
[+] [-] andrepd|4 years ago|reply
Is... that a bad thing? Skepticism is good. When it's about something as hyped as "deep learning", even more so.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] jwilber|4 years ago|reply
I wouldn’t take the few negative comments personally - I’ve seen many GAN architectures that heavily overfit (including my own bobross pix2pix) get a lot of praise, while ‘less violating’ models (like yours) get more skepticism. Skepticism isn’t bad! But I’d wager in your case it may be because you’re a YouTuber, and other ml YouTubers are notorious for ripping off content (eg Siraj).
Not really related to this, but I’d personally love to see the difference in training times it would take an RL agent to adequately learn to drive a car in gta versus adequately flying a helicopter.
[+] [-] ALittleLight|4 years ago|reply
[+] [-] senkora|4 years ago|reply
https://m.youtube.com/watch?v=eP5hHKne_gE&feature=youtu.be
Full list of projects: https://sites.google.com/site/artml2018/showcase/final-proje...
[+] [-] sentdex|4 years ago|reply
[+] [-] emptyparadise|4 years ago|reply
[+] [-] reasonabl_human|4 years ago|reply
Edit: https://www.reddit.com/r/linux/comments/mtnld7/programmer_cr...
[+] [-] sentdex|4 years ago|reply
[+] [-] whalesalad|4 years ago|reply
[+] [-] Sharlin|4 years ago|reply
[+] [-] slver|4 years ago|reply
[+] [-] nitrogen|4 years ago|reply
[+] [-] ludwigschubert|4 years ago|reply
[+] [-] philipswood|4 years ago|reply
https://github.com/sentdex/GANTheftAuto/
[+] [-] okamiueru|4 years ago|reply
The reason why I ask is that it seems very challenging to generate the training data for such systems. Could someone explain how this can go further than to just replicating X? So, if assuming some creative freedom, could you give an idea of what the long term application of this would be?
NB: please take my questions at face value without thinking I'm implying this isn't cool for what it is. I'm all for people having fun. I'm all for projects not needing to tackle some grander issue.
[+] [-] tiborsaas|4 years ago|reply
[+] [-] 4dahalibut|4 years ago|reply
Do you see yourself taking this train of play further?
[+] [-] sentdex|4 years ago|reply
There's just something about AI-based environments that is particularly intriguing!
[+] [-] TinkersW|4 years ago|reply
It seems to flicker/fade things in alot, like the random poles that keep appearing and disappearing, it seems like there is not enough focus on temporal consistency or something?
[+] [-] bruce343434|4 years ago|reply
[+] [-] someperson|4 years ago|reply
This is the direction virtual and augmented reality is headed (Facebook Codec Avatar, and their room reconstruction technology).
[+] [-] junon|4 years ago|reply
Really well done.
[+] [-] HerrmannM|4 years ago|reply
I'm curious about why you cannot share the GTA5 mod and collection script? I'm curious about that part too -- obtaining good data is always hard.
Cheers and all the best!
[+] [-] dividuum|4 years ago|reply
[+] [-] jsiepkes|4 years ago|reply
[+] [-] slver|4 years ago|reply
Unless it's a high-concept game whose very goal is offering you a dream environment.
But I do believe neural networks will get into everything. They're the last missing piece of our compute model.
[+] [-] darepublic|4 years ago|reply
[+] [-] boyadjian|4 years ago|reply
[+] [-] black_puppydog|4 years ago|reply
sudo python3 inference.py?
Really? :D
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] Randomoneh|4 years ago|reply
[+] [-] sentdex|4 years ago|reply