Flux is so frustrating to me. Really good prompt adherence, strong ability to keep track of multiple parts of a scene, it's technically very impressive. However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance. And, I can't even fine tune a painterly art style of any sort into Flux dev. I get that there was working, living artist backlash at SD and I can therefore imagine that the BFL team has decided not to train on art, but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity.
For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.
I’ve had the same problem with photography styles, even though the photographer I’m going for is Prokudin-Gorskii who used emulsion plates in the 1910s and the entire Library of Congress collection is in the public domain. I’m curious how they even managed to remove them from the training data since the entire LoC is such an easy dataset to access.
And I can't imagine there's a real copyright (or ethical) issue with including artwork in the public domain because the artist died over a century ago.
I think that's part of what makes FLUX.1 so good: the content it's trained on is very similar.
Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.
I wonder if part of the reason it's good is because it's been trained for a more specific task. I can only imagine that if your concept of a "house" includes range from a stately home to "a pineapple under the sea" you're going to end up with a very generalised concept. It's then takes specific prompting to remove the influences you're not interested in.
I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.
One thing that makes FLUX so special is the prompt understanding. I now gave FLUX 1.1 a prompt "Closeup of a doll house built to resemble a famous room in the TV show Friends" and it gave me one with the sign "Central Perk". I never prompted for the text "Central Perk". A Redditor also discovered that it has an associative understanding of emotions. For example "Rose of passion" and it may draw a flower that is burning, because passion is fiery.
This is miles ahead of most other image generation models available today.
Yet, it doesn't seem to know how a Tektronix 4010 actually looks like... ;)
I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.
That is astoundingly good adherence to the description. I already liked and was impressed by Flux1 but that is perhaps the most impressive image generation I've ever seen.
It's quite good at following a detailed paragraph long description of an scene, which is a double edged sword. A lot of the fun for me with early text to image models was underspecifying an image and then enjoying how the model "invents" it. "Steampunk spaceship", "communist bear", "glass city".
flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)
I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.
Far more interesting will be when pony diffusion V7 launches.
No one in the image space wants to admit it, but well over half of your user base wants to generate hardcore NSFW with your models and they mostly don’t care about any other capabilities.
Ah, that was one short gravy train even by modern tech company standards. Really wish the space was more competitive and open so it wouldn't just be one company at the top locking their models behind APIs.
It doesn’t get piano keyboards right, but it’s the first image generator I’ve tried that sometimes get “someone playing accordion” mostly right.
When I ask for a man playing accordion, it’s usually a somewhat flawed piano accordion, but If I ask for a woman playing accordion, it’s usually a button accordion. I’ve also seen a few that are half-button, half-piano monstrosities.
Also, if I ask for “someone playing accordion”, it’s always a woman.
Periodic data is always hard for generative image systems - particularly if that "cycle" window is relatively large (as would be the case for octaves of a piano).
I'm running Asahi Linux on a 32GB M1 Pro. Any chance of being able to run text-to-image models locally? I've had some success with LLMs, but only the smaller models. No idea where to start with images, everything seems geared towards msft+nvda.
"Draw Things" is a native Mac app for text to image. It's a a lot more advanced than DiffusionBee, it will download the models for you, and it's free. It's also available for iOS. (!)
I'm worried about what happens when more people find out about Ideogram.
There are a lot of things that don't appear in ELO scores. For one, they will not reflect that you cannot prompt women's faces in Flux. We can only speculate why.
How locked down is it? My problem with a lot of these is I like to make really ridiculous meme type images, but I run into walls for dumb reasons. Like if I want to make something thats "copyrighted" like a mix of certain characters from one franchise or whatever, I cannot sometimes I get told that the model cannot generate copyrighted content, even though courts ruled that AI generated stuff cannot be copyrighted either way...
I feel like AI should just be treated as fair use as long as its not 100% blatantly a literal clone of the original work.
I've been playing with Flux.Dev and such a big step forward from Stable Diffusion and all the other Generative AIs that could run on consumer GPUs.
I just tried this Flux1.1 pro page (prompt: "A sad Macintosh user who is upset because his computer can't play games") and was very impressed by the detail and "understanding" this model has.
I asked for a simple scene and it drew in the exact same AI girl that every text-to-image model wants to draw, same face, same hair, so generic that a Google reverse image search pulls up thousands of the exact same AI girl. No variety of output at all.
The answer is it really depends on your hardware, but the nice thing is that you can split out the text encoder when using ComfyUI. On a 24gb VRAM card I can run the Q8_0 GGUF version of flux-dev with the T5 FP16 text encoder. The Q8_0 gguf version in particular has very little visual difference from the original fp16 models. A 1024x1024 image takes about 15 seconds to generate.
I really enjoy its service. It's promising for UI design. My advocacy website pages' UI design was bootstrapped using it. It is quite good for developers without much design ability.
Ironically, I am afraid to type the website out and will keep it unknown here. My account could be suspended because of this. It had already reached -1 karma. It's better to keep my account alive.
The generated images look impressive of course but I can't help but be mildly amused by the fact that the prompt for the second example image insists strongly that the image should say 1.1:
> ... photo with the text "FLUX 1.1 [Pro]", ..., must say "1.1", ...
Sorry to be a noob, but how does this relate to fastflux.ai which seems to work great and creates an image in less than a second? Is this a new model on a slower host?
in3d|1 year ago
vessenes|1 year ago
For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.
crystal_revenge|1 year ago
I suspect we'll see the answer to this is LoRAs. Two examples that stick out are:
- Flux Tarot v1 [0]
- Flux Amateur Photography [1]
Both of these do a great job of combining all the benefits of Flux with custom styles that seem to work quite well.
[0] https://huggingface.co/multimodalart/flux-tarot-v1 [1] https://civitai.com/models/652699?modelVersionId=756149
whywhywhywhy|1 year ago
It feels like they just removed names from the datasets to make it worse at recreating famous people and artists.
throwup238|1 year ago
gs17|1 year ago
thomastjeffery|1 year ago
Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.
weebull|1 year ago
I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.
DeathArrow|1 year ago
https://huggingface.co/nyanko7/flux-dev-de-distill
pdntspa|1 year ago
skort|1 year ago
But that real art still exists, and can still be found, so what exactly is the loss here?
ilaksh|1 year ago
jug|1 year ago
This is miles ahead of most other image generation models available today.
drdaeman|1 year ago
I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.
nikcub|1 year ago
PcChip|1 year ago
loufe|1 year ago
loxias|1 year ago
flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)
I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.
ChrisArchitect|1 year ago
(https://news.ycombinator.com/item?id=41730626)
sharkjacobs|1 year ago
"our most advanced and efficient model yet"
"a significant step forward in our mission to empower creators"
I get it, you can't sell things if you don't market them, and you can't make a living making things if you don't sell them, but it's exhausting.
johnfn|1 year ago
https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...
bemmu|1 year ago
halJordan|1 year ago
vunderba|1 year ago
Some comparisons against DALL-E 3.
https://mordenstar.com/blog/flux-comparisons
arizen|1 year ago
- Take your morning to the next level!
minimaxir|1 year ago
Der_Einzige|1 year ago
No one in the image space wants to admit it, but well over half of your user base wants to generate hardcore NSFW with your models and they mostly don’t care about any other capabilities.
Jackson__|1 year ago
skybrian|1 year ago
When I ask for a man playing accordion, it’s usually a somewhat flawed piano accordion, but If I ask for a woman playing accordion, it’s usually a button accordion. I’ve also seen a few that are half-button, half-piano monstrosities.
Also, if I ask for “someone playing accordion”, it’s always a woman.
vunderba|1 year ago
whitehexagon|1 year ago
loxias|1 year ago
LeoPanthera|1 year ago
collinvandyck76|1 year ago
edit: nevermind, it's a macos app
doctorpangloss|1 year ago
There are a lot of things that don't appear in ELO scores. For one, they will not reflect that you cannot prompt women's faces in Flux. We can only speculate why.
liuliu|1 year ago
giancarlostoro|1 year ago
I feel like AI should just be treated as fair use as long as its not 100% blatantly a literal clone of the original work.
byteknight|1 year ago
mainframed|1 year ago
evrim189111|1 year ago
fortran77|1 year ago
I just tried this Flux1.1 pro page (prompt: "A sad Macintosh user who is upset because his computer can't play games") and was very impressed by the detail and "understanding" this model has.
unknown|1 year ago
[deleted]
jeffbee|1 year ago
ks2048|1 year ago
nirav72|1 year ago
vunderba|1 year ago
unknown|1 year ago
[deleted]
nickthegreek|1 year ago
https://github.com/lllyasviel/stable-diffusion-webui-forge
https://www.reddit.com/r/StableDiffusion/comments/1esxkk8/ho...
doctorpangloss|1 year ago
It's about 6 lines of Python.
sophrocyne|1 year ago
minimaxir|1 year ago
leumon|1 year ago
Mashimo|1 year ago
pdntspa|1 year ago
kindkang2024|1 year ago
Ironically, I am afraid to type the website out and will keep it unknown here. My account could be suspended because of this. It had already reached -1 karma. It's better to keep my account alive.
nubinetwork|1 year ago
washadjeffmad|1 year ago
TobTobXX|1 year ago
Mashimo|1 year ago
unknown|1 year ago
[deleted]
jchw|1 year ago
> ... photo with the text "FLUX 1.1 [Pro]", ..., must say "1.1", ...
...And of course, it does not.
thisisnotauser|1 year ago
[deleted]
ionwake|1 year ago
simeon989|1 year ago
[deleted]
simeon989|1 year ago
[deleted]
pieter2222|1 year ago
[deleted]
pieter2222|1 year ago
[deleted]
jamesc4|1 year ago
[deleted]
jamesc4|1 year ago
[deleted]
davidddef223|1 year ago
[deleted]
pieter2222|1 year ago
[deleted]
jamesc4|1 year ago
[deleted]
bobdenver8008|1 year ago
[deleted]
basitsoomro123|1 year ago
[deleted]
melvinmelih|1 year ago