(no title)
seth_
|
3 years ago
Author here: fwiw we are running the app on a10g GPUs, which generally can turn around a 512x512 in 3.5s with 50 inference steps. This time includes converting the image into audio which should be done on the GPU as well for real-time purposes. We did some optimization such as a traced unet, fp16 and removing autocast. There are lots of ways it could be sped up further I'm sure!
No comments yet.