top | item 41130620

Flux: Open-source text-to-image model with 12B parameters

683 points| CuriouslyC | 1 year ago |blog.fal.ai | reply

226 comments

order
[+] burkaygur|1 year ago|reply
hi friends! burkay from fal.ai here. would like to clarify that the model is NOT built by fal. all credit should go to Black Forest Labs (https://blackforestlabs.ai/) which is a new co by the OG stable diffusion team.

what we did at fal is take the model and run it on our inference engine optimized to run these kinds of models really really fast. feel free to give it a shot on the playgrounds. https://fal.ai/models/fal-ai/flux/dev

[+] metadat|1 year ago|reply
The playground is a drag. After accepting being forced to sign up, attach my GitHub, and hand over my email address, I entered the desired prompt and waited with anticipation.. Only to see a black screen and how much it's going to cost per megapixel.

Bummer. After seeing what was generated in the blog post I was excited to try it! Now feeling disappointed.

I was hoping it'd be more like https://play.go.dev.

Good luck.

[+] Hizonner|1 year ago|reply
You also might want to "clarify" that it is not open source (and neither are any of the other "open source" models). If you want to call it something, try "open weights", although the usage restrictions make even that a HUGE FUCKING STRETCH.

Also, everybody should remember that these models are not copyrightable and you should never agree to any license for them...

[+] tikkun|1 year ago|reply
> We are excited to introduce Flux

I'd suggest re-wording the blog post intro, it reads as if it was created by Fal.

Specific phrases to change:

> Announcing Flux

(from the title)

> We are excited to introduce Flux

> Flux comes in three powerful variations:

This section also comes across as if you created it

> We invite you to try Flux for yourself.

Reads as if you're the creator

[+] frognumber|1 year ago|reply
It would be nice to understand limits of the free tier. I couldn't find that anywhere. I see pricing, but I'm generating images without swiping my credit card.

If it's unlimited or "throttled for abuse," say that. Right now, I don't know if I can try it six times or experiment to my heart's desire.

[+] vessenes|1 year ago|reply
Congrats Burkay - the model is very impressive. One area I’d like to see improved in a flux v2 is knowledge of artist styles. Flux cannot respond to requests asking for paintings in the style of David Hockney, Norman Rockwell, Edgar Degas, — it seems to have no fine art training at all.

I’d bet that fine art training would further improve the compositional skills of the model, plus it would open up a range of uses that are (to me at least) a bit more interesting than just illustrations.

[+] dabeeeenster|1 year ago|reply
The unsubscribe links in your emails don't work
[+] shubik22|1 year ago|reply
thanks for hosting the model! i created an account to try it out, you started emailing me with “important notice: low account balance - action required” and now it seems like there’s no way for me to unsubscribe or delete my account. is that the case? thanks!
[+] RobotToaster|1 year ago|reply
If you are using the dev model, the licence isn't open source.
[+] minimaxir|1 year ago|reply
The [schnell] model variant is Apache-licensed and is open sourced on Hugging Face: https://huggingface.co/black-forest-labs/FLUX.1-schnell

It is very fast and very good at rendering text, and appears to have a text encoder such that the model can handle both text and positioning much better: https://x.com/minimaxir/status/1819041076872908894

A fun consequence of better text rendering is that it means text watermarks from its training data appear more clearly: https://x.com/minimaxir/status/1819045012166127921

[+] nwoli|1 year ago|reply
That’s not really fair to conclude that the training data contains vanity fair images since the prompt includes “by Vanity Fair”.

I could write “with text that says Shutterstock” in the prompt but that doesn’t necessairly mean the dataset contains that

[+] RobotToaster|1 year ago|reply
How does the licence work when there's a bunch of restrictions at the bottom of that page that seem to contradict the licence?
[+] dheera|1 year ago|reply
Thank you. Their website is super hard to navigate and I can't find a "DOWNLOAD" button.
[+] treesciencebot|1 year ago|reply
You can try the models here:

(available without sign-in) FLUX.1 [schnell] (Apache 2.0, open weights, step distilled): https://fal.ai/models/fal-ai/flux/schnell

(requires sign-in) FLUX.1 [dev] (non-commercial, open weights, guidance distilled): https://fal.ai/models/fal-ai/flux/dev

FLUX.1 [pro] (closed source [only available thru APIs], SOTA, raw): https://fal.ai/models/fal-ai/flux-pro

[+] Vinnl|1 year ago|reply
> (available without sign-in) FLUX.1 [schnell] (Apache 2.0, open weights, step distilled): https://fal.ai/models/fal-ai/flux/schnell

Well, I was wondering about bias in the model, so I entered "a president" as the prompt. Looks like it has a bias alright, but it's even more specific than I expected...

[+] RobotToaster|1 year ago|reply
What is the difference between schnell and dev? Just the kind of distillation?
[+] Aardwolf|1 year ago|reply
What's the difference between pro and dev? Is the pro one also 12B parameters? Are the example images on the site (the patagonia guy, lego and the beach potato) generated with dev or pro?
[+] layer8|1 year ago|reply
Requires sign-in with a GitHub account, unfortunately.
[+] smusamashah|1 year ago|reply
Tested it using prompts from ideogram (login walled) which has great prompt adherence. Flux generated very very good images. I have been playing with ideogram but i don't want their filters and want to have a similar powerful system running locally.

If this runs locally, this is very very close to that in terms of both image quality and prompt adherence.

I did fail at writing text clearly when text was a bit complicated. This ideogram image's prompt for example https://ideogram.ai/g/GUw6Vo-tQ8eRWp9x2HONdA/0

> A captivating and artistic illustration of four distinct creative quarters, each representing a unique aspect of creativity. In the top left, a writer with a quill and inkpot is depicted, showcasing their struggle with the text "THE STRUGGLE IS NOT REAL 1: WRITER". The scene is comically portrayed, highlighting the writer's creative challenges. In the top right, a figure labeled "THE STRUGGLE IS NOT REAL 2: COPY ||PASTER" is accompanied by a humorous comic drawing that satirically demonstrates their approach. In the bottom left, "THE STRUGGLE IS NOT REAL 3: THE RETRIER" features a character retrieving items, complete with an entertaining comic illustration. Lastly, in the bottom right, a remixer, identified as "THE STRUGGLE IS NOT REAL 4: THE REMI

Otherwise, the quality is great. I stopped using stable diffusion long time ago, the tools and tech around it became very messy, its not fun anymore. Been using ideogram for fun but I want something like ideogram that I can run locally without any filters. This is looking perfect so far.

This is not ideogram, but its very very good.

[+] benreesman|1 year ago|reply
Ideogram handles text really well but I don’t want to be on some weird social network.

If this thing can mint memes with captions in it on a single node I guess that’s the weekend gone.

Thanks for the useful review.

[+] ilkke|1 year ago|reply
You can run it locally in ComfyUI. I was able to run it with 12GB of vram and reportedly even 8GB is doable, albeit very slow.
[+] seveibar|1 year ago|reply
whenever I see a new model I always see if it can do engineering diagrams (e.g. "two square boxes at a distance of 3.5mm"), still no dice on this one. https://x.com/seveibar/status/1819081632575611279

Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)

[+] roenxi|1 year ago|reply
It'll probably come suddenly. It has been fascinating to me watching the journey from Stable Diffusion 1 to 3. SD1 was a very crude model, where putting a word in the prompt might or might not add representations of the word to the image. Eg, using the word "hat" somewhere in the prompt might do literally nothing or suddenly there were hats everywhere. The context of the word didn't mean much to SD1.

SD2 was more consistent about the word appearing in the image. "hat" would add hats more reliably. Context started to matter a little bit.

SD3 seems to be getting a lot better at the idea of scene composition, so now specific entities can be prompted to wear hats. Not perfect, but noticeably improved from SD2.

Extrapolating from that, we're still a few generations from being able to describe things with the precision of an engineering diagram - but we're heading in the right direction at a rapid clip. I doubt there needs to be any specialist work yet, just time and the improvement of general purpose models.

[+] napoleongl|1 year ago|reply
Can’t you get this done via an LLM and have it generate code for mermaid or D2 or something? I’ve been fiddling around with that a bit in order to create flowcharts and datamodels, and I’m pretty sure I’ve seen at least one of those languages handle absolute positioning of object.
[+] tantalor|1 year ago|reply
Seems to do pretty poorly with spatial relationships.

"An upside down house" -> regular old house

"A horse sitting on a dog" -> horse and dog next to eachother

"An inverted Lockheed Martin F-22 Raptor" -> yikes https://fal.media/files/koala/zgPYG6SqhD4Y3y_E9MONu.png

[+] minimaxir|1 year ago|reply
It appears the model does have some "sanity" restrictions from whatever its training process is that limits some of the super weird outputs.

"A horse sitting on a dog" doesn't work but "A dog sitting on a horse" works perfectly.

[+] bboygravity|1 year ago|reply
a zebra on top of an elephant worked fine for me
[+] PoignardAzur|1 year ago|reply
Am I missing something? The beach image they give still fails to follow the prompt in major ways.
[+] swatcoder|1 year ago|reply
You're not. I'm surprised at their selections because neither the cooking one nor the beach one adhere to the prompt in very well, and that first one only does because it prompt largely avoids much detail altogether. Overall, the announcement gives the sense that it can make pretty pictures but not very precise ones.
[+] perstablintome|1 year ago|reply
The quality is difficult to judge consistently as there's variants among seed with the same prompt. And then there's the problem of cherry picked examples making the news. So I'm building a community gallery to generate Pro images for free, hope this at least increases the sample size https://fluxpro.art/
[+] SV_BubbleTime|1 year ago|reply
Wow.

I have seen a lot of promises made by diffusion models.

This is in a whole different world. I legitimately feel bad for the people still a StabilityAI.

The playground testing is really something else!

The licensing model isn’t bad, although I would like to see them promise to open up their old closed source models under Apache when they release new API versions.

The prompt adherence and the breadth of topics it seems to know without a finetune and without any LORAs, is really amazing.

[+] Havoc|1 year ago|reply
Bit annoying signup...Github only...and github account creation is currently broken "Something went wrong". Took two tries and two browsers...
[+] fernly|1 year ago|reply
I had the same "something went wrong" experience, but on retrying the "sign in to run" button, it was fine and had logged me in.

Gave me a credit of 2USD to play with.

[+] vunderba|1 year ago|reply
Vast majority of comparisons aren't really putting these new models through their paces.

The best prompt adherence on the market right now BY FAR is DALL-E 3 but it still falls down on more complicated concepts and obviously is hugely censored - though weirdly significantly less censored if you hit their API directly.

I quickly mocked up a few weird/complex prompts and did some side-by-side comparisons with Flux and DALL-E 3. Flux is impressive and significantly performant particularly since both the dev/shnell models have been confirmed by Black Forest to be runnable via ComfyUI.

https://mordenstar.com/blog/flux-comparisons

[+] harrisonjackson|1 year ago|reply
Your comparisons are all with the flux shnell model

> The fastest image generation model tailored for local development and personal use

Versus flux pro or dev models

[+] Der_Einzige|1 year ago|reply
How long until nsfw fine tunes? Don’t pretend like it’s not on all of y’all’s minds, since over half of all the models on Civit.ai are NSFW. That’s what folks in the real world actually do with these models.
[+] throwoutway|1 year ago|reply
> Nearby, anthropomorphic fruits play beach volleyball.

This is missing from the image. The generated image looks well, but while reading the prompt I was surpised it was missing

[+] fl0id|1 year ago|reply
Mmmh, trying my recent test prompts, still pretty shit. F.e. whereas midjourney or SD do not have a problem to create a pencil sketch, with this model (pro), it always looks more like a black and white photograph or digital illustration or render. It is also like all the others apparently not able to follow instructions on the position of characters. (i.e. X and Y are turned away from each other).
[+] viraptor|1 year ago|reply
Censored a bit, but not completely. I can get occasional boobs out of it, but sometimes it just gives the black output.
[+] refulgentis|1 year ago|reply
This gives you no info on how the model works. what is being applied is fal's post-inference "is this NSFW?" filter model

So your censorship investigation (via boobs) is testing a completely different, unrelated, model.

[+] yjftsjthsd-h|1 year ago|reply
> FLUX.1 [dev]: The base model, open-sourced with a non-commercial license

...then it's not open source. At least the others are Apache 2.0 (real open source) and correctly labeled proprietary, respectively.

[+] cwoolfe|1 year ago|reply
Hey, great work over at fal.ai to run this on your infrastructure and for building in a free $2 in credits to try before buying. For those thinking of running this at home, I'll save you the trouble. Black Forest Flux did not run easily on my Apple Silicon MacBook at this time. (Please let me know if you have gotten this to run for you on similar hardware.) Specifically, it falls back to using CPU which is very slow. Changing device to 'mps' causes error "BFloat16 is not supported on MPS"
[+] zarmin|1 year ago|reply
WILD

Photo of teen girl in a ski mask making an origami swan in a barn. There is caption on the bottom of the image: "EAT DRUGS" in yellow font. In the background there is a framed photo of obama

https://i.imgur.com/RifcWZc.png

Donald Trump on the cover of "Leopards Ate My Face" magazine

https://i.imgur.com/6HdBJkr.png