Half a dozen articles on ML-based image manipulation on HN at once. Seems we're really entering into a golden age of AI-based real-world applications, at least in specific niches. Personally I'm really excited about the potential of this in design, art, movies, games and interactive storytelling. Hard to imagine what will be possible in 5-10 years from now, but I kind of expect RPG games with fully AI-generated aesthetics / graphics and stories, where only some core gameplay mechanics are still determined by the designers of the game. Really can't wait to see that.
The work described in the linked article is also extremely impressive and feels almost unreal, in any case.
I don't know; I feel the real-world applications are still missing and what we are now seeing are tech demos (impressive ones!) and gimmicks. I'm still waiting to see all this ML stuff to be used in a productive context.
I don’t think the pervasiveness of ML articles on HN are an indicator of anything except hype trends on certain subject matters. ML research in these spaces has been very high output for many years now.
As someone in the field of computer graphics , where there’s been considerable ML research over the past few years that are more reliably applicable to people’s lives , most of the exciting stuff doesn’t make it to the front page of HN even if it’s posted here.
There’s been lots of research in the past few years. The initial shiny stuff makes it on here, but it’s the follow up iterations that are highly catalyzing of change that don’t because public interest in those topics has waned in the interim.
Speaking of which, is there any good ML-based superresolution algorithm out there? I'm trying to print a poster but some of my figures are in low resoltion ...
> Seems we're really entering into a golden age of AI-based real-world applications...
I wouldn't call moving pixels on a screen "real-world". Are these technologies going one day to have a physical effect on our lives, like, in the real real-world? I very much doubt it.
Is it only me who noticed how teeth appears absolutely out of nowhere when people smile on the demo footage? And it looks not facinating. It looks horryfying.
Probably because of falling into the uncanny valley [0].
Don't get me wrong, it's an incredible feat, and seems to handily beat the other automagic interpolators (eg, 3:49 in the video at the bottom of TFA) in terms of minimizing "pop-in", but it's still clearly present in dentition.
Inspired by their own Gulliver's Travels[0] example I tried it out on two frames of an anime with 15 FPS. Not quite ready for that type of animation[1], although that is to be expected since the differences in arm positions of the input frames are pretty extreme. Having said that it got a lot of other details right!
This feels like something that would be perfect for one man, or small team, animation studios. If this could draw the in-betweens i imagine a talented artist (which I am not) could produce films in literal fractions of the time it takes to draw every frame. If you're not happy with the result, just add another frame.
Hard to say, but this is kind of what A. 3D animation already does and B. Sort of a misunderstanding of animation.
Animated frames are supposed to convey intention. They’re fantastic at doing this since you can manipulate every detail of every frame. The idea that you’ll just run an AI through it, that might work for dialogue scenes of a typical Japanese TV anime where intention is low and mostly it’s indeed grunt work. But I would imagine it would be a bit lifeless - unless someone trains an ML specifically for anime using good animation as a reference.
Basically just moving between two frames is an example of extremely poor animation.
Isn't this what flash tweening already allowed 20 years ago? The technique here seems ideal for already-existing drawn images or photographs, but if you're drawing something from scratch you can provide a lot more context for interpolation by starting with vector data instead of raster frames.
This was my first thought too, even for large studios, even for existing media. Would be neat to see the comparison between an existing animation or stop-motion that was done at 12 fps and see it scaled up.
We're going to get an explosion of indie animated shows. Will soon be possible to make as a year-long passion project what used to require $15 million and network exec buy-in.
I still can't wrap my head around how people absolutely ignore kids' rights to privacy putting their photos/videos without their consent.
I would have been pretty bummed by my teens if I found out all my life's history was there for the whole world to crawl, collect, train their ad/surveillance NNs on, etc.
Don't worry, by the time this kid's old enough to even care, he'll be unrecognizable. If it's any consolation. I cannot recognize this kid as anything other than a "kid". Good looking kid for sure, but still a kid.
Does anyone know if it's possible to run this on Apple Silicon GPU? I've been playing with Stable Diffusion on M1 and having fun, I'd love to be able to use this to interpolate between frames as shown in another recent post.
> synthesizes multiple intermediate frames from two input images
That's a neat use case, and definitely a good way to show off, but what about more than one image?
The overwhelming majority of video that exists today is 30fps or lower. The overwhelming majority of displays support 60hz or more.
Most high-end TVs do some realtime frame interpolation, but there is only so much an algorithm can do to fill in the blanks. It doesn't take long to see artifacts.
I would be more interested to see what an ML-based approach could do with the edge cases of interpolating 30fps video than 2 frames.
Actually most of the video frame interpolation programs in the market uses two frames interpolation. Theoretically, you can do a better job with multiple frames but this doesn't bring much more values beside of some extreme cases.
Yep. I'd also be interested at least in A/B-ing this against current motion interpolation methods used in televisions. Does it perform perceptually better in blind viewer tests? Does it get rid of the soap opera effect? Does it have its own flavor of "something's off about this video"? All questions I'd love to see answered.
For historical footage, I could see some use cases. For cinema, I don't know why you'd want to do this. < 60 fps playback of video that was shot at < 60 fps looks just fine. Even if the interpolation was perfect, what's the benefit?
It seems like this could be a good way to provide smooth weather / cloud animations using real or raw cloud images rather than those heat maps most apps use.
[+] [-] ThePhysicist|3 years ago|reply
The work described in the linked article is also extremely impressive and feels almost unreal, in any case.
[+] [-] zokier|3 years ago|reply
I don't know; I feel the real-world applications are still missing and what we are now seeing are tech demos (impressive ones!) and gimmicks. I'm still waiting to see all this ML stuff to be used in a productive context.
[+] [-] acomjean|3 years ago|reply
I think Dwarf Fortress has the story generation part. The aesthetics/graphics part not yet..
And I think its procedurally generated, but with complex and strange results.
https://www.reddit.com/r/dwarffortress/comments/2ztnkw/i_thi...
[+] [-] melling|3 years ago|reply
News and sports, in particular.
[+] [-] dagmx|3 years ago|reply
As someone in the field of computer graphics , where there’s been considerable ML research over the past few years that are more reliably applicable to people’s lives , most of the exciting stuff doesn’t make it to the front page of HN even if it’s posted here.
There’s been lots of research in the past few years. The initial shiny stuff makes it on here, but it’s the follow up iterations that are highly catalyzing of change that don’t because public interest in those topics has waned in the interim.
[+] [-] lukaszkups|3 years ago|reply
[+] [-] april_22|3 years ago|reply
[+] [-] amelius|3 years ago|reply
[+] [-] seydor|3 years ago|reply
[+] [-] croes|3 years ago|reply
[+] [-] ciconia|3 years ago|reply
I wouldn't call moving pixels on a screen "real-world". Are these technologies going one day to have a physical effect on our lives, like, in the real real-world? I very much doubt it.
[+] [-] zx8080|3 years ago|reply
Probably because of falling into the uncanny valley [0].
0 - https://en.m.wikipedia.org/wiki/Uncanny_valley
[+] [-] revolvingocelot|3 years ago|reply
Don't get me wrong, it's an incredible feat, and seems to handily beat the other automagic interpolators (eg, 3:49 in the video at the bottom of TFA) in terms of minimizing "pop-in", but it's still clearly present in dentition.
[+] [-] bfirsh|3 years ago|reply
[+] [-] vanderZwan|3 years ago|reply
[0] https://replicate.com/google-research/frame-interpolation/ex...
[1] https://imgur.com/6GZSZSO
[+] [-] Zobat|3 years ago|reply
[+] [-] kranke155|3 years ago|reply
Animated frames are supposed to convey intention. They’re fantastic at doing this since you can manipulate every detail of every frame. The idea that you’ll just run an AI through it, that might work for dialogue scenes of a typical Japanese TV anime where intention is low and mostly it’s indeed grunt work. But I would imagine it would be a bit lifeless - unless someone trains an ML specifically for anime using good animation as a reference.
Basically just moving between two frames is an example of extremely poor animation.
Source: am animator, sort of.
[+] [-] AlexandrB|3 years ago|reply
[+] [-] aimor|3 years ago|reply
[+] [-] DylanDmitri|3 years ago|reply
[+] [-] ZoomZoomZoom|3 years ago|reply
I would have been pretty bummed by my teens if I found out all my life's history was there for the whole world to crawl, collect, train their ad/surveillance NNs on, etc.
[+] [-] OzzyB|3 years ago|reply
[+] [-] NoSorryCannot|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] osanseviero|3 years ago|reply
[+] [-] richrichardsson|3 years ago|reply
[+] [-] tough|3 years ago|reply
dain didn't work for me in m1 https://github.com/nihui/dain-ncnn-vulkan
[+] [-] thomastjeffery|3 years ago|reply
That's a neat use case, and definitely a good way to show off, but what about more than one image?
The overwhelming majority of video that exists today is 30fps or lower. The overwhelming majority of displays support 60hz or more.
Most high-end TVs do some realtime frame interpolation, but there is only so much an algorithm can do to fill in the blanks. It doesn't take long to see artifacts.
I would be more interested to see what an ML-based approach could do with the edge cases of interpolating 30fps video than 2 frames.
[+] [-] zlatan28|3 years ago|reply
[+] [-] summerlight|3 years ago|reply
[+] [-] samwillis|3 years ago|reply
[+] [-] paskozdilar|3 years ago|reply
[+] [-] kelseyfrog|3 years ago|reply
[+] [-] AlexandrB|3 years ago|reply
[+] [-] arriu|3 years ago|reply