Fooocus: OSS for image generation by ControlNet author

[+] erwannmillon|2 years ago|reply

"Native refiner swap inside one single k-sampler. The advantage is that now the refiner model can reuse the base model's momentum (or ODE's history parameters) collected from k-sampling to achieve more coherent sampling. In Automatic1111's high-res fix and ComfyUI's node system, the base model and refiner use two independent k-samplers, which means the momentum is largely wasted, and the sampling continuity is broken. Fooocus uses its own advanced k-diffusion sampling that ensures seamless, native, and continuous swap in a refiner setup."

This is so interesting and seems obvious in retrospect, but super impressive! The code is simple too, going to hack around with this over the weekend :)

[+] Klaster_1|2 years ago|reply

As a frontend developer, this reads to me as technobabble you'd find in entertainment media. In general, I learn about things not directly related to my sphere of interests by osmosis, but this is on another level. Reminds me of the time when I started my computing journey. I wonder if I'll be able to understand this eventually just by reading a relevant comment or blog here and there.

[+] liuliu|2 years ago|reply

Definitely interesting. It seems that I accidentally implemented that in Draw Things.

[+] airgapstopgap|2 years ago|reply

> Linux and Mac

> Coming soon ...

Ah well. Hopefully it is soon. Also, on behalf of all Apple Silicon Mac users, would be nice if the author looked into implementing Metal FlashAttention [1].

1. https://github.com/philipturner/metal-flash-attention

[+] sunpazed|2 years ago|reply

I can get this running on my M1 Mac Pro (CPU mode) with a few tweaks in the cv2win32.py file. I'll submit a change request.

[+] politelemon|2 years ago|reply

For those who don't know, ControlNet is often used in conjunction with Stable Diffusion. It lets you add extra conditions to guide what is being generated. There are extensions for Automatic1111's stable diffusion webui that can make use of ControlNet. Some examples I've seen are copying the pose of a person/animal in an image and outputting a different person with the same pose (and extending to videos). Also taking line art drawings and filling it in with style.

https://stable-diffusion-art.com/controlnet/

[+] isoprophlex|2 years ago|reply

Not to forget crazy creative QR codes that still scan!

https://stable-diffusion-art.com/qr-code/

[+] kashunstva|2 years ago|reply

> Learned from Midjourney, the manual tweaking is not needed, and users only need to focus on the prompts and images

Except prompt-based tweaking doesn’t work very well in MJ; certainly not as well as manually-directed in-painting and out-painting. It’s virtually impossible in MJ to hold one part of the image constant while adding to/modifying the remainder.

[+] AbraKdabra|2 years ago|reply

Those commits are something else.

[+] k3liutZu|2 years ago|reply

Oh wow.

I am not sure what I would have expected upon reading this comment, but I was not prepared.

[+] dvrp|2 years ago|reply

i

[+] Hard_Space|2 years ago|reply

Interesting, and I look forward to using it, but I wish the distribution had kept the folder-name conventions of AUTOMATIC1111, so that we could more easily have used symbolic links for folders of LoRAs and checkpoints etc. that we'd rather not duplicate.

[+] erikprotagonist|2 years ago|reply

Apparently it uses the folder structure of ComfyUI - I just symlinked the models folder from that and it worked with no issues. (I also reused my ComfyUI venv, just had to do a pip install pygit2 to make it work)

[+] andybak|2 years ago|reply

Can't you symlink individual files? More effort but only a quick bit of scripting away.

(I've occasionally used a duplicate file eliminator that finds dups over a certain size and replaces them with symlinks. You can run it on an entire subtree or drive)

[+] dvrp|2 years ago|reply

i’m sure there are ways around it no?

[+] GaggiX|2 years ago|reply

The names given to the commits are... peculiar.

[+] netghost|2 years ago|reply

SCM as Save Button.

[+] peddling-brink|2 years ago|reply

i

[+] wruza|2 years ago|reply

Tbh I’m (loosely) following commit message best practices in all of my projects out of irrational fear of being viewed as unprofessional. But never needed that effing prose in my workflow. Maybe a keyword from time to time. I’m using code, not messages to navigate history, and in a rare occasion. If all my messages turned into “i” I’d lose nothing, because all rationales and essentials are in code comments. I’d better seen dates (and related grouping) in a log by default and looked for a commit by some grepping patch contents rather than messages.

[+] yellow_postit|2 years ago|reply

definitely the smoothest install process and relatively snappy on my local windows machine that I've come across. I do hope to see some ControlNet integrations as that's become a key part of my workflow for exploring new images.

[+] ilkke|2 years ago|reply

Would you be willing to share a bit how you use controlnet in your exploration workflow?

My biggest discovery so far is using shuffle to guide the output style (and curating a folder of great style guide images).

[+] captn3m0|2 years ago|reply

Are there ways to run such apps with a remote GPU over network? I want to run the UI on my laptop, but use my homeserver GPU from the local network.

Anything better than X forwarding?

[+] freeone3000|2 years ago|reply

This project returns results over HTTP. You get what you want by running the server binary on the server, then access it with your web browser.

[+] unknown|2 years ago|reply

[deleted]

[+] sorenjan|2 years ago|reply

Just like I expected, I get this error when trying to run it on my AMD GPU...

"RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx"

Maybe it can get modified to use DirectML? Although it looks like it's using PyTorch 2.0, and I think torch-directml only supports 1.13. Why is ML and GPGPU such a dependency mess?

[+] bufferoverflow|2 years ago|reply

The sample image on the github page doesn't look great. Major problems with the eyes, something both SD and MJ have solved, for the most part.

[+] AuryGlenz|2 years ago|reply

Without something like Adetailer or the ComfyUI equivalent it’s kind of useless to do anything with relatively small faces.

For those that don’t know, the Adetailer extension for Auto1111 does a second pass on faces at a higher resolution and then inpaints them back in.

[+] natch|2 years ago|reply

Great steps. I would still like to see something offline that can blend two disparate images into one generated scene, like artbreeder has.

[+] dvrp|2 years ago|reply

i know! it’s hard to make offline because of gpu requirements

[+] r-k-jo|2 years ago|reply

Here's a live demo on HuggingFace https://huggingface.co/spaces/SpacesExamples/Fooocus

54 comments