hi friends! burkay from fal.ai here. would like to clarify that the model is NOT built by fal. all credit should go to Black Forest Labs (https://blackforestlabs.ai/) which is a new co by the OG stable diffusion team.
what we did at fal is take the model and run it on our inference engine optimized to run these kinds of models really really fast. feel free to give it a shot on the playgrounds. https://fal.ai/models/fal-ai/flux/dev
The playground is a drag. After accepting being forced to sign up, attach my GitHub, and hand over my email address, I entered the desired prompt and waited with anticipation.. Only to see a black screen and how much it's going to cost per megapixel.
Bummer. After seeing what was generated in the blog post I was excited to try it! Now feeling disappointed.
You also might want to "clarify" that it is not open source (and neither are any of the other "open source" models). If you want to call it something, try "open weights", although the usage restrictions make even that a HUGE FUCKING STRETCH.
Also, everybody should remember that these models are not copyrightable and you should never agree to any license for them...
It would be nice to understand limits of the free tier. I couldn't find that anywhere. I see pricing, but I'm generating images without swiping my credit card.
If it's unlimited or "throttled for abuse," say that. Right now, I don't know if I can try it six times or experiment to my heart's desire.
Congrats Burkay - the model is very impressive. One area I’d like to see improved in a flux v2 is knowledge of artist styles. Flux cannot respond to requests asking for paintings in the style of David Hockney, Norman Rockwell, Edgar Degas, — it seems to have no fine art training at all.
I’d bet that fine art training would further improve the compositional skills of the model, plus it would open up a range of uses that are (to me at least) a bit more interesting than just illustrations.
thanks for hosting the model! i created an account to try it out, you started emailing me with “important notice: low account balance - action required” and now it seems like there’s no way for me to unsubscribe or delete my account. is that the case? thanks!
It is very fast and very good at rendering text, and appears to have a text encoder such that the model can handle both text and positioning much better: https://x.com/minimaxir/status/1819041076872908894
Well, I was wondering about bias in the model, so I entered "a president" as the prompt. Looks like it has a bias alright, but it's even more specific than I expected...
What's the difference between pro and dev? Is the pro one also 12B parameters? Are the example images on the site (the patagonia guy, lego and the beach potato) generated with dev or pro?
Tested it using prompts from ideogram (login walled) which has great prompt adherence. Flux generated very very good images. I have been playing with ideogram but i don't want their filters and want to have a similar powerful system running locally.
If this runs locally, this is very very close to that in terms of both image quality and prompt adherence.
> A captivating and artistic illustration of four distinct creative quarters, each representing a unique aspect of creativity. In the top left, a writer with a quill and inkpot is depicted, showcasing their struggle with the text "THE STRUGGLE IS NOT REAL 1: WRITER". The scene is comically portrayed, highlighting the writer's creative challenges. In the top right, a figure labeled "THE STRUGGLE IS NOT REAL 2: COPY ||PASTER" is accompanied by a humorous comic drawing that satirically demonstrates their approach. In the bottom left, "THE STRUGGLE IS NOT REAL 3: THE RETRIER" features a character retrieving items, complete with an entertaining comic illustration. Lastly, in the bottom right, a remixer, identified as "THE STRUGGLE IS NOT REAL 4: THE REMI
Otherwise, the quality is great. I stopped using stable diffusion long time ago, the tools and tech around it became very messy, its not fun anymore. Been using ideogram for fun but I want something like ideogram that I can run locally without any filters. This is looking perfect so far.
whenever I see a new model I always see if it can do engineering diagrams (e.g. "two square boxes at a distance of 3.5mm"), still no dice on this one. https://x.com/seveibar/status/1819081632575611279
Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)
It'll probably come suddenly. It has been fascinating to me watching the journey from Stable Diffusion 1 to 3. SD1 was a very crude model, where putting a word in the prompt might or might not add representations of the word to the image. Eg, using the word "hat" somewhere in the prompt might do literally nothing or suddenly there were hats everywhere. The context of the word didn't mean much to SD1.
SD2 was more consistent about the word appearing in the image. "hat" would add hats more reliably. Context started to matter a little bit.
SD3 seems to be getting a lot better at the idea of scene composition, so now specific entities can be prompted to wear hats. Not perfect, but noticeably improved from SD2.
Extrapolating from that, we're still a few generations from being able to describe things with the precision of an engineering diagram - but we're heading in the right direction at a rapid clip. I doubt there needs to be any specialist work yet, just time and the improvement of general purpose models.
Can’t you get this done via an LLM and have it generate code for mermaid or D2 or something? I’ve been fiddling around with that a bit in order to create flowcharts and datamodels, and I’m pretty sure I’ve seen at least one of those languages handle absolute positioning of object.
You're not. I'm surprised at their selections because neither the cooking one nor the beach one adhere to the prompt in very well, and that first one only does because it prompt largely avoids much detail altogether. Overall, the announcement gives the sense that it can make pretty pictures but not very precise ones.
The quality is difficult to judge consistently as there's variants among seed with the same prompt. And then there's the problem of cherry picked examples making the news. So I'm building a community gallery to generate Pro images for free, hope this at least increases the sample size https://fluxpro.art/
I have seen a lot of promises made by diffusion models.
This is in a whole different world. I legitimately feel bad for the people still a StabilityAI.
The playground testing is really something else!
The licensing model isn’t bad, although I would like to see them promise to open up their old closed source models under Apache when they release new API versions.
The prompt adherence and the breadth of topics it seems to know without a finetune and without any LORAs, is really amazing.
Vast majority of comparisons aren't really putting these new models through their paces.
The best prompt adherence on the market right now BY FAR is DALL-E 3 but it still falls down on more complicated concepts and obviously is hugely censored - though weirdly significantly less censored if you hit their API directly.
I quickly mocked up a few weird/complex prompts and did some side-by-side comparisons with Flux and DALL-E 3. Flux is impressive and significantly performant particularly since both the dev/shnell models have been confirmed by Black Forest to be runnable via ComfyUI.
How long until nsfw fine tunes? Don’t pretend like it’s not on all of y’all’s minds, since over half of all the models on Civit.ai are NSFW. That’s what folks in the real world actually do with these models.
Mmmh, trying my recent test prompts, still pretty shit. F.e. whereas midjourney or SD do not have a problem to create a pencil sketch, with this model (pro), it always looks more like a black and white photograph or digital illustration or render. It is also like all the others apparently not able to follow instructions on the position of characters. (i.e. X and Y are turned away from each other).
Hey, great work over at fal.ai to run this on your infrastructure and for building in a free $2 in credits to try before buying. For those thinking of running this at home, I'll save you the trouble. Black Forest Flux did not run easily on my Apple Silicon MacBook at this time. (Please let me know if you have gotten this to run for you on similar hardware.) Specifically, it falls back to using CPU which is very slow. Changing device to 'mps' causes error "BFloat16 is not supported on MPS"
Photo of teen girl in a ski mask making an origami swan in a barn. There is caption on the bottom of the image: "EAT DRUGS" in yellow font. In the background there is a framed photo of obama
[+] [-] burkaygur|1 year ago|reply
what we did at fal is take the model and run it on our inference engine optimized to run these kinds of models really really fast. feel free to give it a shot on the playgrounds. https://fal.ai/models/fal-ai/flux/dev
[+] [-] metadat|1 year ago|reply
Bummer. After seeing what was generated in the blog post I was excited to try it! Now feeling disappointed.
I was hoping it'd be more like https://play.go.dev.
Good luck.
[+] [-] Hizonner|1 year ago|reply
Also, everybody should remember that these models are not copyrightable and you should never agree to any license for them...
[+] [-] tikkun|1 year ago|reply
I'd suggest re-wording the blog post intro, it reads as if it was created by Fal.
Specific phrases to change:
> Announcing Flux
(from the title)
> We are excited to introduce Flux
> Flux comes in three powerful variations:
This section also comes across as if you created it
> We invite you to try Flux for yourself.
Reads as if you're the creator
[+] [-] nextos|1 year ago|reply
This library is quite well known, 3rd most starred project in Julia: https://juliapackages.com/packages?sort=stars.
It has been around since, at least, 2016: https://github.com/FluxML/Flux.jl/graphs/code-frequency.
[+] [-] frognumber|1 year ago|reply
If it's unlimited or "throttled for abuse," say that. Right now, I don't know if I can try it six times or experiment to my heart's desire.
[+] [-] vessenes|1 year ago|reply
I’d bet that fine art training would further improve the compositional skills of the model, plus it would open up a range of uses that are (to me at least) a bit more interesting than just illustrations.
[+] [-] dabeeeenster|1 year ago|reply
[+] [-] shubik22|1 year ago|reply
[+] [-] RobotToaster|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] minimaxir|1 year ago|reply
It is very fast and very good at rendering text, and appears to have a text encoder such that the model can handle both text and positioning much better: https://x.com/minimaxir/status/1819041076872908894
A fun consequence of better text rendering is that it means text watermarks from its training data appear more clearly: https://x.com/minimaxir/status/1819045012166127921
[+] [-] nwoli|1 year ago|reply
I could write “with text that says Shutterstock” in the prompt but that doesn’t necessairly mean the dataset contains that
[+] [-] RobotToaster|1 year ago|reply
[+] [-] dheera|1 year ago|reply
[+] [-] treesciencebot|1 year ago|reply
(available without sign-in) FLUX.1 [schnell] (Apache 2.0, open weights, step distilled): https://fal.ai/models/fal-ai/flux/schnell
(requires sign-in) FLUX.1 [dev] (non-commercial, open weights, guidance distilled): https://fal.ai/models/fal-ai/flux/dev
FLUX.1 [pro] (closed source [only available thru APIs], SOTA, raw): https://fal.ai/models/fal-ai/flux-pro
[+] [-] Vinnl|1 year ago|reply
Well, I was wondering about bias in the model, so I entered "a president" as the prompt. Looks like it has a bias alright, but it's even more specific than I expected...
[+] [-] RobotToaster|1 year ago|reply
[+] [-] Aardwolf|1 year ago|reply
[+] [-] layer8|1 year ago|reply
[+] [-] smusamashah|1 year ago|reply
If this runs locally, this is very very close to that in terms of both image quality and prompt adherence.
I did fail at writing text clearly when text was a bit complicated. This ideogram image's prompt for example https://ideogram.ai/g/GUw6Vo-tQ8eRWp9x2HONdA/0
> A captivating and artistic illustration of four distinct creative quarters, each representing a unique aspect of creativity. In the top left, a writer with a quill and inkpot is depicted, showcasing their struggle with the text "THE STRUGGLE IS NOT REAL 1: WRITER". The scene is comically portrayed, highlighting the writer's creative challenges. In the top right, a figure labeled "THE STRUGGLE IS NOT REAL 2: COPY ||PASTER" is accompanied by a humorous comic drawing that satirically demonstrates their approach. In the bottom left, "THE STRUGGLE IS NOT REAL 3: THE RETRIER" features a character retrieving items, complete with an entertaining comic illustration. Lastly, in the bottom right, a remixer, identified as "THE STRUGGLE IS NOT REAL 4: THE REMI
Otherwise, the quality is great. I stopped using stable diffusion long time ago, the tools and tech around it became very messy, its not fun anymore. Been using ideogram for fun but I want something like ideogram that I can run locally without any filters. This is looking perfect so far.
This is not ideogram, but its very very good.
[+] [-] benreesman|1 year ago|reply
If this thing can mint memes with captions in it on a single node I guess that’s the weekend gone.
Thanks for the useful review.
[+] [-] dagaci|1 year ago|reply
See: https://www.reddit.com/r/StableSwarmUI/comments/1ei86ar/flux... (SwarmUI is cross platform and runs on macs, and linux)
[+] [-] ilkke|1 year ago|reply
[+] [-] seveibar|1 year ago|reply
Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)
[+] [-] roenxi|1 year ago|reply
SD2 was more consistent about the word appearing in the image. "hat" would add hats more reliably. Context started to matter a little bit.
SD3 seems to be getting a lot better at the idea of scene composition, so now specific entities can be prompted to wear hats. Not perfect, but noticeably improved from SD2.
Extrapolating from that, we're still a few generations from being able to describe things with the precision of an engineering diagram - but we're heading in the right direction at a rapid clip. I doubt there needs to be any specialist work yet, just time and the improvement of general purpose models.
[+] [-] napoleongl|1 year ago|reply
[+] [-] tantalor|1 year ago|reply
"An upside down house" -> regular old house
"A horse sitting on a dog" -> horse and dog next to eachother
"An inverted Lockheed Martin F-22 Raptor" -> yikes https://fal.media/files/koala/zgPYG6SqhD4Y3y_E9MONu.png
[+] [-] colkassad|1 year ago|reply
[+] [-] minimaxir|1 year ago|reply
"A horse sitting on a dog" doesn't work but "A dog sitting on a horse" works perfectly.
[+] [-] bboygravity|1 year ago|reply
[+] [-] PoignardAzur|1 year ago|reply
[+] [-] swatcoder|1 year ago|reply
[+] [-] perstablintome|1 year ago|reply
[+] [-] SV_BubbleTime|1 year ago|reply
I have seen a lot of promises made by diffusion models.
This is in a whole different world. I legitimately feel bad for the people still a StabilityAI.
The playground testing is really something else!
The licensing model isn’t bad, although I would like to see them promise to open up their old closed source models under Apache when they release new API versions.
The prompt adherence and the breadth of topics it seems to know without a finetune and without any LORAs, is really amazing.
[+] [-] Havoc|1 year ago|reply
[+] [-] fernly|1 year ago|reply
Gave me a credit of 2USD to play with.
[+] [-] vunderba|1 year ago|reply
The best prompt adherence on the market right now BY FAR is DALL-E 3 but it still falls down on more complicated concepts and obviously is hugely censored - though weirdly significantly less censored if you hit their API directly.
I quickly mocked up a few weird/complex prompts and did some side-by-side comparisons with Flux and DALL-E 3. Flux is impressive and significantly performant particularly since both the dev/shnell models have been confirmed by Black Forest to be runnable via ComfyUI.
https://mordenstar.com/blog/flux-comparisons
[+] [-] harrisonjackson|1 year ago|reply
> The fastest image generation model tailored for local development and personal use
Versus flux pro or dev models
[+] [-] Der_Einzige|1 year ago|reply
[+] [-] throwoutway|1 year ago|reply
This is missing from the image. The generated image looks well, but while reading the prompt I was surpised it was missing
[+] [-] fl0id|1 year ago|reply
[+] [-] viraptor|1 year ago|reply
[+] [-] refulgentis|1 year ago|reply
So your censorship investigation (via boobs) is testing a completely different, unrelated, model.
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] yjftsjthsd-h|1 year ago|reply
...then it's not open source. At least the others are Apache 2.0 (real open source) and correctly labeled proprietary, respectively.
[+] [-] UncleOxidant|1 year ago|reply
[+] [-] cwoolfe|1 year ago|reply
[+] [-] zarmin|1 year ago|reply
Photo of teen girl in a ski mask making an origami swan in a barn. There is caption on the bottom of the image: "EAT DRUGS" in yellow font. In the background there is a framed photo of obama
https://i.imgur.com/RifcWZc.png
Donald Trump on the cover of "Leopards Ate My Face" magazine
https://i.imgur.com/6HdBJkr.png
[+] [-] TechDebtDevin|1 year ago|reply