top | item 38649131

(no title)

> I found this claiming an A100 can generate 1 image/s.

The article you linked is over a year old. Needless to say there have been a LOT of optimizations in the last year.

Back then it was common to use 50+ steps for many of the common samplers. Current methods use a few steps like 1. This OnnxStream are using SDXL-turbo, and you can combine LCM and a few other methods to go very fast.

The reason it's so much faster now is the OnnxStream is only using a single step.

This repo claims 149 images/s on a 4090 https://github.com/aifartist/ArtSpew

However even if you only get 1 image/s with whatever GPU you have I stand by my original statement that unless you want to do it for the cool factor (which is very valid), pre-calculating them makes more sense.

discuss

godelski|2 years ago

> This repo claims 149 images/s on a 4090

I actually get around 100 imgs/s on my 3080Ti. Three things to note: 1) you gotta run the max perf code to get the high throughput, 2) the images in this setting are absolute garbage, 3) you don't save the images so you're going to have to edit the code to extract them.

Definitely agree that this project is much more about the cool factor. I suggested a GAN in other comment for similar reasoning (because it's a pi...) but if you want quality images well I'm not sure why anyone would expect to get those out of a pi. High quality images take time and skill. But it's also HN, I'm all for doing things for the cool factor (as long as we don't sell them as things they aren't. ML is cool enough that it doesn't need all the hype and garbage)

dragonwriter|2 years ago

> Back then it was common to use 50+ steps for many of the common samplers. Current methods use a few steps like 1.

The "look how fast we can go" method (turbo model with 1 step and without CFG) is blindingly fast, but the quality is...nothing close to what was being done in normal 50+ steps with normal setitngs gens.

Realistically, even with Turbo+LCM, you're still going to 4+ steps (often 8+), with CFG, for reasonable one-generation quality anywhere close to the images people generated at 50+ steps without Turbo/LCM.

Which is still a big improvement in speed.

filterfiber|2 years ago

> Realistically, even with Turbo+LCM, you're still going to 4+ steps (often 8+), with CFG, for reasonable one-generation quality anywhere close to the images people generated at 50+ steps without Turbo/LCM.

For sure the only reason I considered comparing it that way was because the linked repo appears to also be going for a similar approach with 1 step/image on the pi.

From my own experience I've had a hard ever getting a decent image below 6~8steps, but this repo seems more focused on getting it to run in a reasonable amount of time at all, which understandably requires the minimal "maybe passable" settings.