top | item 34865092

(no title)

Here is a preliminary test for video editing using ControlNet I made: https://www.youtube.com/watch?v=u52MOA4YaGk

As you can see, there is still quite a bit of flicker, I'm working to reduce that. But the consistency is much better compared to, say, img2img.

I'm hoping to ship a prototype this week.

discuss

guiambros|3 years ago

Haven't read the paper yet, but curious how different ControlNet is from Text2LIVE ([1], [2]). Seems it's solving the same problem with temporal consistency, no?

[1] https://www.youtube.com/watch?v=8U9o5aZ2y5w

[2] https://text2live.github.io/

synapticpaint|3 years ago

No, ControlNet wasn't made to solve temporal consistency, it was made to add more control (hence the name) to image models. I am using it in a way that the authors may not have thought of, because the paper doesn't mention video editing.

meghan_rain|3 years ago

S-so the girl on the right in the second half of the video is ot real...?

synapticpaint|3 years ago

Correct, it's generated by SD.

refulgentis|3 years ago

really curious about this too OP, is the face generated? How do you keep temporal consistency with that?

thom|3 years ago

Impressive that it doesn’t just do pose transfer but applies the correct reverse kinematics too (hand on the wall/rail etc).

synapticpaint|3 years ago

Yes, it's asking SD for an image with some set of characteristics, and SD has some notion of what is plausible from what it saw during training.

jpeter|3 years ago

Can you mask the background out?