I came to the same conclusion as the authors after generating 1000s of thumbnails[1]. OpenAI alters faces too much and smoothes out details by default. NanoBanana is the best but lacks high fidelity option. SeeDream is catching up to NanoBanana and sometimes is better. It's been too long since OpenAI's gpt-img-1 came out, hope they launch a better model soon.
I am probably at 50k-60k image generations from various models.
It is just very hard to make any generalizations because any single prompt will lead to so many different types of images.
The only thing I would really say to generalize is every model has strengths and weaknesses depending on what you are going for.
It is also generally very hard to explore all the possibilities of a model. So many times I thought I seen what the model could do just to be completely blown away by a particular generation.
I don't know if you looked at the same article as I did, but nanobanana seems to be the worst by far at following the prompts. Just look at the heat map images
Do you run thumbnail.ai? I would really like to try it, but I'm not going to pay before I've seen even a single generated thumbnail in my context. Is it unviable to let people generate at least a few thumbnails before they have to decide whether to pay?
I run a fairly comprehensive model comparison site (generative and editing). In my experience:
NanoBanana and Flux Kontext are the models that get closest to traditional SDXL inpainting techniques.
Seedream is a strong contender by virtue of its ability to natively handle higher resolutions (up to around 4 megapixels) so you lose less detail - however it also tends to alter the color palette more often then not.
Finally GPT-image-1 (yellowish filter notwithstanding) exhibits very strong prompt adherence but will almost always change a number of the details.
It's interesting to me that the models often have their "quirks". GPT has the orange tint, but it also is much worse at being consistent with details. Gemini has a problem where it often returns the image unchanged or almost unchanged, to the point where I gave up on using it for editing anything. Not sure if Seedream has a similar defining "feature".
They noted the Gemini issue too:
> Especially with photos of people, Gemini seems to refuse to apply any edits at all
Nano Banana in general cannot do style transfer effectively unless the source image/subject is a similar style as the target style, which is an interesting and unexpected model quirk. Even the documentation examples unintentionally demonstrates this.
Seedream will always alter the global color balance with edits.
I've definitely noticed Gemini's tendency to return the image basically unchanged, but not noticed it being worse or better for images of people. When I tested by having it change aspects of a photo of me, I found it was far more likely to cooperate when I'd specify, for instance, "change the hair from long to short" rather than "Make the hair short" (the latter routinely failed completely).
It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.
I have had that problem with nano banana but when it works I find it so much better than the others for editing an image. Since it’s free I usually try it first, and I would say approximately 10% of the time find myself having to use something else.
I’m editing mostly pics of food and beverages though, it wouldn’t surprise me if it is situationally better or worse.
If you don’t want your image to look like it’s been marinated in nicotine, throw stuff like “neutral white background, daylight balanced lighting, no yellow tint” into your prompt. Otherwise, congrats on your free vintage urine filter.
They don't want you creating images that mimic either works of other artists to an extent that's likely to confuse viewers (or courts), or that mimic realistic photographs to an extent that allows people to generate low-effort fake news. So they impose an intentionally-crappy orange-cyan palette on everything the model generates.
Peak quality in terms of realistic color rendering was probably the initial release of DALL-E 3. Once they saw what was going to happen, they fixed that bug fast.
Found OpenAI too often heavy handed. On balance, I'd probably pick Gemini narrowly over Seedream and just learn that sometimes Gemini needs a more specific prompt.
I like that they call openai’s image generator ground breaking and then explain that it’s prone to taking eight times longer to generate an image before showing it add a third cat over and over and over again
Is it me or ChatGPT change subtle or sometimes more prominent things? Like ball holding position of the hand, face features like for head, background trees and alike?
Timings were measured on a consumer internet connection in Japan (Fiber connection, 10 Gbps nominal bandwidth) during a limited test run in a short time period.
"consumer internet connection in Japan", "10 Gbps nominal bandwidth"
Coming from a third world country, that surprises me.
The 10gbit connection costs me ¥5,000/mo (around USD 30/mo), which was actually slightly cheaper than I was paying for 1 Gbit...
The main issue is latency and bandwidth across the oceans since Asia far away from the US where a lot of servers live, and even for services that are distributed, I live in a rural prefectural capitol of Japan 1000 km away from Tokyo where all the "Japan" data centers are, so my ping is always unimpressive despite the bandwidth.
The shortcut to flip between models in an expanded view is nice, but the original image should also be included as one of the things to flip between, and should be included in the side by side view.
Honestly, I think it was misfounded. As an photographer and artist myself, I find the OpenAI results head-and-shoulders above the others. It's not perfect, and in a few cases one or the other alternative did better, but if I had to pick one, it would be OpenAI for sure. The gap between their aesthetics and mine makes me question ever using their other products (which is purely academic since I'm not an Apple person).
It's disturbing how the models sometimes alter the objects in the images when they're only supposed to add an effect. That's not just a complete failure of the task, it also means manual work since a human has to double check evry detail in every image.
Using gen. ai for filters is stupid, a filter guarantees the same object but filtered, a gen. AI version of this guarantees nothing and an expensive AI bill.
It’s like using gen. ai to do math instead of extracting the numbers from a story and just doing the math with +, -, / and *
Well, that is a good point. That is for everyone themselves to decide, I suppose.
To me, I like to think in times the model failed versus success. So what I did, is I looked every time at the worst result. To me, the one which stood out (negatively) is Gemini. OpenAI had some very good results but also some missing the mark. SeeDream (which I never heard of previously) missed the mark less often than Gemini, and at times where OpenAI failed, SeeDream came out clearly on top.
So, if I were to use the effects of the mentioned models, I wouldn't bother with Gemini; only OpenAI and SeeDream.
This seems to imply that the capabilities being tested are like the descriptive words used in the prompts, but, as a category using random words would be just as valid for exercising the extents of the underlying math. And when I think of that reality I wonder why a list of tests like this should be interesting and to what ends. The repeated nature of the iteration implies that some control or better quality is being sought but the mechanism of exploration is just trial and error and not informative of what would be repeatable success for anyone else in any other circumstance given these discoveries.
Hey. We'd love to fund thr generations for free for you to try Riverflow 2 out if you're up for it. Riverflow 1 ranks above them all and 2 is now in preview this week.
Are people doing image generation really using these models much? I've generated a lot of images, but I always use ComfyUI with local models and custom workflows. I only have 8GB of VRAM and I can easily do 1000s of images per day if I want to.
I dunno about you lot, but I actually really like Stable Diffusion 1.5.
I like giving it weird, non-prompts, like lines from songs or novels. I then run it for a few hundred generations locally and doing stuff with the malformed shit it comes out with. I have a few art projects like this.
[+] [-] justhw|4 months ago|reply
[1] = https://thumbnail.ai/
[+] [-] Libidinalecon|4 months ago|reply
It is just very hard to make any generalizations because any single prompt will lead to so many different types of images.
The only thing I would really say to generalize is every model has strengths and weaknesses depending on what you are going for.
It is also generally very hard to explore all the possibilities of a model. So many times I thought I seen what the model could do just to be completely blown away by a particular generation.
[+] [-] malfist|4 months ago|reply
[+] [-] sd9|4 months ago|reply
I am a small time youtuber
[+] [-] vunderba|4 months ago|reply
NanoBanana and Flux Kontext are the models that get closest to traditional SDXL inpainting techniques.
Seedream is a strong contender by virtue of its ability to natively handle higher resolutions (up to around 4 megapixels) so you lose less detail - however it also tends to alter the color palette more often then not.
Finally GPT-image-1 (yellowish filter notwithstanding) exhibits very strong prompt adherence but will almost always change a number of the details.
[+] [-] gs17|4 months ago|reply
They noted the Gemini issue too:
> Especially with photos of people, Gemini seems to refuse to apply any edits at all
[+] [-] minimaxir|4 months ago|reply
Seedream will always alter the global color balance with edits.
[+] [-] dwringer|4 months ago|reply
It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.
[+] [-] bird0861|4 months ago|reply
[+] [-] mattmaroon|4 months ago|reply
I’m editing mostly pics of food and beverages though, it wouldn’t surprise me if it is situationally better or worse.
[+] [-] frotaur|4 months ago|reply
[+] [-] DeathArrow|4 months ago|reply
[+] [-] CamperBob2|4 months ago|reply
Peak quality in terms of realistic color rendering was probably the initial release of DALL-E 3. Once they saw what was going to happen, they fixed that bug fast.
[+] [-] beezle|4 months ago|reply
[+] [-] Dwedit|4 months ago|reply
[+] [-] Bombthecat|4 months ago|reply
[+] [-] jrflowers|4 months ago|reply
[+] [-] kalleboo|4 months ago|reply
[+] [-] fsniper|4 months ago|reply
[+] [-] qayxc|4 months ago|reply
[+] [-] amanverasia|4 months ago|reply
"consumer internet connection in Japan", "10 Gbps nominal bandwidth"
Coming from a third world country, that surprises me.
[+] [-] kalleboo|4 months ago|reply
The main issue is latency and bandwidth across the oceans since Asia far away from the US where a lot of servers live, and even for services that are distributed, I live in a rural prefectural capitol of Japan 1000 km away from Tokyo where all the "Japan" data centers are, so my ping is always unimpressive despite the bandwidth.
[+] [-] angry_albatross|4 months ago|reply
[+] [-] jstummbillig|4 months ago|reply
Love the optimism
[+] [-] LogicFailsMe|4 months ago|reply
[+] [-] CWuestefeld|4 months ago|reply
[+] [-] emsign|4 months ago|reply
[+] [-] yapyap|4 months ago|reply
It’s like using gen. ai to do math instead of extracting the numbers from a story and just doing the math with +, -, / and *
[+] [-] alienbaby|4 months ago|reply
[+] [-] Fnoord|4 months ago|reply
To me, I like to think in times the model failed versus success. So what I did, is I looked every time at the worst result. To me, the one which stood out (negatively) is Gemini. OpenAI had some very good results but also some missing the mark. SeeDream (which I never heard of previously) missed the mark less often than Gemini, and at times where OpenAI failed, SeeDream came out clearly on top.
So, if I were to use the effects of the mentioned models, I wouldn't bother with Gemini; only OpenAI and SeeDream.
[+] [-] th0ma5|4 months ago|reply
[+] [-] kevin009|4 months ago|reply
[+] [-] chanw|4 months ago|reply
[+] [-] lunias|4 months ago|reply
[+] [-] specproc|4 months ago|reply
I like giving it weird, non-prompts, like lines from songs or novels. I then run it for a few hundred generations locally and doing stuff with the malformed shit it comes out with. I have a few art projects like this.
Aphex Twin vibes.
[+] [-] cwoolfe|4 months ago|reply
[+] [-] MrBuddyCasino|4 months ago|reply