top | item 41930932

(no title)

Nice! I peeked at the code and thought I’d share a few tips for improving the low frame rate:

Base64 encoding the JPEG bytes will increase payload size up to ~30% and burns CPU cycles on both client and server. This is unnecessary, as Websocket protocol can send binary payloads (doesn’t need to be text).

Consider removing lossy jpg compression as well, ie just send the raw RGB bytes over the network. Then on the server side you can simply call Image.frombuffer(…).

StreamDiffusion can achieve high frame rates because of extensive batching in the pipeline. You’re not benefiting from that here as the client is only sending one frame at a time and then waiting for a response. See this example for an idea of how to queue input frames and consume them in batches https://github.com/cumulo-autumn/StreamDiffusion/blob/main/e... .

Alternatively you could take a look at the SDXL Turbo and Lightning models. They are very fast at img2img but have limited resolution of 512² or 1024² pixels respectively. Which appears a bit lower than what you’re aiming for here, but they can be run locally in real time on a high end consumer grade GPU. For reference I have some code demonstrating this here https://github.com/GradientSurfer/Draw2Img/tree/main

discuss

bambax|1 year ago

Ok, but I wonder if it really needs to be real time like this? Wouldn't it make more sense to have some kind of button: somebody makes a pose, takes a picture, the picture is run through some kind of transformation and comes back as a painting that stays there until someone takes another picture? Wouldn't the illusion of art be better that way? (It would not be a "mirror" anymore though.)

roland35|1 year ago

I think it has to be either real time or a very low framerate, like once every 30 seconds. That way you have time to see each "painting"

MrLeap|1 year ago

yeah yeah yeah, do all these things, and afterwards, look at 2d interpolation methods that don't require AI for your inbetweens. There's some real fast kernel math that can lerp from one blob to another at 8 billion fps.

enjeyw|1 year ago

I think you’re getting downvoted because “yeah yeah yeah” is normally a sign that someone is sarcastically dismissing an idea, but the rest of your comment suggests you’re not at all - linerp is a great idea!