VToonify: Controllable high-resolution portrait video style transfer

viraptor|3 years ago

Also reviewed at Two minute papers with a few examples https://youtube.com/watch?v=C9LDMzMRZv8 (hold on to your papers...)

burntalmonds|3 years ago

What a time to be alive!

nineteen999|3 years ago

Well it's cool and all, but the results sit right at the very deepest part of the uncanny valley.

Gigachad|3 years ago

It's like a tech demo, a preview of the future. Give it 5 years and it will be super refined and probably the future of low cost animation for kids TV shows and stuff. Then even further, like how no one animates without a computer now, no one will animate without AI assistance.

crazygringo|3 years ago

As far as I can tell, it really depends on the intensity of the "slider".

When the insensity of the style transfer is pushed mostly to the right (high), it just seems like Pixar or cartoons. Nothing uncanny whatsoever.

But when they show is about a quarter of the way to the right... it's utter nightmare fuel, like plastic surgery taken way too far. The worst kind of uncanny valley, so I definitely agree with you there.

morjom|3 years ago

I wonder what the implementation into e.g live streaming would require.

nmstoker|3 years ago

On Linux there are techniques involving a loopback, such as here: https://github.com/umlaeute/v4l2loopback

Effectively these let an app (eg some VToonify tool) generate content that from the perspective of your live streaming app look like they are from a webcam

marcAKAmarc|3 years ago

I'm glad things are progressing, but it bugs me that AI is largely being innovated for the use of... things like this? I know this comment is a bit disparaging and minimizes greater achievements, and I apologize for that, but the closeness of content-consumerism and AI is becoming quite off putting.

techdragon|3 years ago

Based on the numbers in the paper this is just a little bit too slow for use as a real time video effect. At ~0.1 seconds per frame we just need about a 3x improvement in performance to get to 30fps “real time” video frame rates.

And on that thought since it appears they used nVidia hardware based on the CUDA dependency, it would be interesting to see how this performs on something like an M1/M2 where there’s dedicated ML hardware to help offload and accelerate things.

speedgoose|3 years ago

The paper also says they used 8 Tesla V100. They are GPUs that are dedicated for ML and quite a bit more powerful than a m2.

drusepth|3 years ago

Does M1/M2 really outperform CUDA on beefy ML GPUs in tasks like this? I'd love to see numbers if so; this seems extremely surprising.

kevingadd|3 years ago

nVidia hardware has dedicated ML silicon, though?

Uehreka|3 years ago

Sure, but that’s not too far from the 12fps that cartoon animators often actually use.

22 comments