TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

mishu2|2 months ago

Having the ability to do real-time video generation on a single workstation GPU is mind blowing.

I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.

iberator|2 months ago

Computer games have been doing it for decades already.

jjcm|2 months ago

Looks like there is some quality reduction, but nonetheless 2s to generate a 5s video on a 5090 for WAN 2.1 is absolutely crazy. Excited to see more optimizations like this moving into 2026.

avaer|2 months ago

Efficient realtime video diffusion will revolutionize the way people use computers even more so than LLMs.

I actually think we are already there with quality, but nobody is going to wait 10 minutes to do a task with video that takes 2 seconds with text.

If Sora/Kling/whatever ran cool locally 24/7 at 60FPS, would anyone ever build a UI? Or a (traditional) OS?

I think it's worth watching the scaling graph.

villgax|2 months ago

That’s not the actual time if you run it, encoding and decoding is extra

kristopolous|2 months ago

this is probably the best tool for this stuff now: https://github.com/deepbeepmeep/Wan2GP

It has fastwan ... probably will have this soon. it's a request in multiple tickets: https://github.com/deepbeepmeep/Wan2GP/issues

bsenftner|2 months ago

Video AI acceleration is tricky, where many of the currently in use acceleration loras and cache level accelerations have a subtle at first impact on the generated video, which renders these accelerations as poison for video work: the AI's become dumber to the degree they can't follow camera directions, and the character performances suffer, the lip sync becomes a lip flap, and the body motions are reduced in quality, and become repetitive.

Now, I've not tested TurboDiffusion yet, but I am very actively generating AI video, I probably did a half hour of finished video clips yesterday. There is no test for this issue yet, and for the majority it is yet to be realized as an issue.

fcpk|2 months ago

Out of curiosity, what do you do with the footage? in a personal way I found it fun for the occasional funny situational video, or for some small background animations, but not so useful as a whole. I understand for things like making sketches from scripts and quick prototyping it's nice, but I am genuinely curious what's the use :)

codingbuddy|2 months ago

We are scarily close to realtime personalization of video which if you agree with this NeurIPS paper [1] may lead to someone inadvertently creating “digital heroin”

[1] https://neurips.cc/virtual/2025/loc/san-diego/poster/121952

hapticmonkey|2 months ago

> We further urge the machine learning community to act proactively by establishing robust design guidelines, collaborating with public health experts, and supporting targeted policy measures to ensure responsible and ethical deployment

We’ve seen this play out before, when social media first came to prominence. I’m too old and cynical to believe anything will happen. But I really don’t know what to do about it at a person level. Even if I refuse to engage in this content, and am able to identify it, and keep my family away from it…it feels like a critical mass of people in my community/city/country are going to be engaging with it. It feels hopeless.

lysace|2 months ago

Potentially interesting that the authors are primarily affiliated with NatWest - a British bank. I had to Google their names to find that out, though.

They highlight reduced workplace productivity as a risk, among other things.

numpad0|2 months ago

It saddens me to think that the efforts so far hasn't been it. Maybe I should try my hand at "closing the loop" for image generation models.

Could it destroy the society? The humanity had lived through bunch of such actual substances, and always got bored of it in matters of decades... those risk talks feel a bit overblown to me.

unknown|2 months ago

[deleted]

jjmarr|2 months ago

Infinite Jest predicted this.

unknown|2 months ago

[deleted]

benreesman|2 months ago

Fun fact: if you say the right prayers to the Myelin Gods it will fuse straight through sage3 at D/DQ like it's seen it before, which of course it has.

https://gist.github.com/b7r6/94f738f4e5d1a67d4632a8fbd18d347...

Faster than Turbo with no pre-distill.

redundantly|2 months ago

Now if someone could release an optimization like this for the M4 Max I would be so happy. Last time I tried generating a video it was something like an hour for a 480p 5-second clip.

unknown|2 months ago

[deleted]

jimmydoe|2 months ago

maybe wait for M5 Max and new MLX.

villgax|2 months ago

I mean the baselines were deliberately worse and not how someone would be using these to begin with maybe noobs and the quoted number is only for DIT steps not for other encoding and decoding steps, which is actually quite high still. No actual use of FA4/Cutlass based kernels nor TRT at any point.

sroussey|2 months ago

I want to use this on a website!

46 comments