top | item 34006500

(no title)

seth_ | 3 years ago

Author here: fwiw we are running the app on a10g GPUs, which generally can turn around a 512x512 in 3.5s with 50 inference steps. This time includes converting the image into audio which should be done on the GPU as well for real-time purposes. We did some optimization such as a traced unet, fp16 and removing autocast. There are lots of ways it could be sped up further I'm sure!

discuss

No comments yet.