We ran over 600 image generations to compare AI image models

[+] justhw|4 months ago|reply

I came to the same conclusion as the authors after generating 1000s of thumbnails[1]. OpenAI alters faces too much and smoothes out details by default. NanoBanana is the best but lacks high fidelity option. SeeDream is catching up to NanoBanana and sometimes is better. It's been too long since OpenAI's gpt-img-1 came out, hope they launch a better model soon.

[1] = https://thumbnail.ai/

[+] Libidinalecon|4 months ago|reply

I am probably at 50k-60k image generations from various models.

It is just very hard to make any generalizations because any single prompt will lead to so many different types of images.

The only thing I would really say to generalize is every model has strengths and weaknesses depending on what you are going for.

It is also generally very hard to explore all the possibilities of a model. So many times I thought I seen what the model could do just to be completely blown away by a particular generation.

[+] malfist|4 months ago|reply

I don't know if you looked at the same article as I did, but nanobanana seems to be the worst by far at following the prompts. Just look at the heat map images

[+] sd9|4 months ago|reply

Do you run thumbnail.ai? I would really like to try it, but I'm not going to pay before I've seen even a single generated thumbnail in my context. Is it unviable to let people generate at least a few thumbnails before they have to decide whether to pay?

I am a small time youtuber

[+] vunderba|4 months ago|reply

I run a fairly comprehensive model comparison site (generative and editing). In my experience:

NanoBanana and Flux Kontext are the models that get closest to traditional SDXL inpainting techniques.

Seedream is a strong contender by virtue of its ability to natively handle higher resolutions (up to around 4 megapixels) so you lose less detail - however it also tends to alter the color palette more often then not.

Finally GPT-image-1 (yellowish filter notwithstanding) exhibits very strong prompt adherence but will almost always change a number of the details.

[+] gs17|4 months ago|reply

It's interesting to me that the models often have their "quirks". GPT has the orange tint, but it also is much worse at being consistent with details. Gemini has a problem where it often returns the image unchanged or almost unchanged, to the point where I gave up on using it for editing anything. Not sure if Seedream has a similar defining "feature".

They noted the Gemini issue too:

> Especially with photos of people, Gemini seems to refuse to apply any edits at all

[+] minimaxir|4 months ago|reply

Nano Banana in general cannot do style transfer effectively unless the source image/subject is a similar style as the target style, which is an interesting and unexpected model quirk. Even the documentation examples unintentionally demonstrates this.

Seedream will always alter the global color balance with edits.

[+] dwringer|4 months ago|reply

I've definitely noticed Gemini's tendency to return the image basically unchanged, but not noticed it being worse or better for images of people. When I tested by having it change aspects of a photo of me, I found it was far more likely to cooperate when I'd specify, for instance, "change the hair from long to short" rather than "Make the hair short" (the latter routinely failed completely).

It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.

[+] bird0861|4 months ago|reply

Check out Mask Banana - you might have better luck with using masks to get image models to pay attention to what you want edited.

[+] mattmaroon|4 months ago|reply

I have had that problem with nano banana but when it works I find it so much better than the others for editing an image. Since it’s free I usually try it first, and I would say approximately 10% of the time find myself having to use something else.

I’m editing mostly pics of food and beverages though, it wouldn’t surprise me if it is situationally better or worse.

[+] frotaur|4 months ago|reply

It's crazy that the 'piss filter' of openAI image generation hasn't been fixed yet. I wonder if it's on purpose for some reason ?

[+] DeathArrow|4 months ago|reply

If you don’t want your image to look like it’s been marinated in nicotine, throw stuff like “neutral white background, daylight balanced lighting, no yellow tint” into your prompt. Otherwise, congrats on your free vintage urine filter.

[+] CamperBob2|4 months ago|reply

They don't want you creating images that mimic either works of other artists to an extent that's likely to confuse viewers (or courts), or that mimic realistic photographs to an extent that allows people to generate low-effort fake news. So they impose an intentionally-crappy orange-cyan palette on everything the model generates.

Peak quality in terms of realistic color rendering was probably the initial release of DALL-E 3. Once they saw what was going to happen, they fixed that bug fast.

[+] beezle|4 months ago|reply

Found OpenAI too often heavy handed. On balance, I'd probably pick Gemini narrowly over Seedream and just learn that sometimes Gemini needs a more specific prompt.

[+] Dwedit|4 months ago|reply

You can always identify the OpenAI result because it's yellow.

[+] Bombthecat|4 months ago|reply

And mid journey because it's cell shading:)

[+] jrflowers|4 months ago|reply

I like that they call openai’s image generator ground breaking and then explain that it’s prone to taking eight times longer to generate an image before showing it add a third cat over and over and over again

[+] kalleboo|4 months ago|reply

I meant to say it was ground-breaking when it was released, the other models came later.

[+] fsniper|4 months ago|reply

Is it me or ChatGPT change subtle or sometimes more prominent things? Like ball holding position of the hand, face features like for head, background trees and alike?

[+] qayxc|4 months ago|reply

It's not you. The model seems to refuse to accurately reproduce details. It changes things and leaves stuff out every time.

[+] amanverasia|4 months ago|reply

Timings were measured on a consumer internet connection in Japan (Fiber connection, 10 Gbps nominal bandwidth) during a limited test run in a short time period.

"consumer internet connection in Japan", "10 Gbps nominal bandwidth"

Coming from a third world country, that surprises me.

[+] kalleboo|4 months ago|reply

The 10gbit connection costs me ¥5,000/mo (around USD 30/mo), which was actually slightly cheaper than I was paying for 1 Gbit...

The main issue is latency and bandwidth across the oceans since Asia far away from the US where a lot of servers live, and even for services that are distributed, I live in a rural prefectural capitol of Japan 1000 km away from Tokyo where all the "Japan" data centers are, so my ping is always unimpressive despite the bandwidth.

[+] angry_albatross|4 months ago|reply

The shortcut to flip between models in an expanded view is nice, but the original image should also be included as one of the things to flip between, and should be included in the side by side view.

[+] jstummbillig|4 months ago|reply

> If you made it all the way down here you probably don’t need a summary

Love the optimism

[+] LogicFailsMe|4 months ago|reply

I skipped to the end to see if they did any local models. spoilers: they didn't.

[+] CWuestefeld|4 months ago|reply

Honestly, I think it was misfounded. As an photographer and artist myself, I find the OpenAI results head-and-shoulders above the others. It's not perfect, and in a few cases one or the other alternative did better, but if I had to pick one, it would be OpenAI for sure. The gap between their aesthetics and mine makes me question ever using their other products (which is purely academic since I'm not an Apple person).

[+] emsign|4 months ago|reply

It's disturbing how the models sometimes alter the objects in the images when they're only supposed to add an effect. That's not just a complete failure of the task, it also means manual work since a human has to double check evry detail in every image.

[+] yapyap|4 months ago|reply

Using gen. ai for filters is stupid, a filter guarantees the same object but filtered, a gen. AI version of this guarantees nothing and an expensive AI bill.

It’s like using gen. ai to do math instead of extracting the numbers from a story and just doing the math with +, -, / and *

[+] alienbaby|4 months ago|reply

Interesting experiment, though I'm not certain quite how the models are usefully compared.

[+] Fnoord|4 months ago|reply

Well, that is a good point. That is for everyone themselves to decide, I suppose.

To me, I like to think in times the model failed versus success. So what I did, is I looked every time at the worst result. To me, the one which stood out (negatively) is Gemini. OpenAI had some very good results but also some missing the mark. SeeDream (which I never heard of previously) missed the mark less often than Gemini, and at times where OpenAI failed, SeeDream came out clearly on top.

So, if I were to use the effects of the mentioned models, I wouldn't bother with Gemini; only OpenAI and SeeDream.

[+] th0ma5|4 months ago|reply

This seems to imply that the capabilities being tested are like the descriptive words used in the prompts, but, as a category using random words would be just as valid for exercising the extents of the underlying math. And when I think of that reality I wonder why a list of tests like this should be interesting and to what ends. The repeated nature of the iteration implies that some control or better quality is being sought but the mechanism of exploration is just trial and error and not informative of what would be repeatable success for anyone else in any other circumstance given these discoveries.

[+] kevin009|4 months ago|reply

Everyday I generate more than 600 image and also compare them, it takes me 5 hours

[+] chanw|4 months ago|reply

Hey. We'd love to fund thr generations for free for you to try Riverflow 2 out if you're up for it. Riverflow 1 ranks above them all and 2 is now in preview this week.

[+] lunias|4 months ago|reply

Are people doing image generation really using these models much? I've generated a lot of images, but I always use ComfyUI with local models and custom workflows. I only have 8GB of VRAM and I can easily do 1000s of images per day if I want to.

[+] specproc|4 months ago|reply

I dunno about you lot, but I actually really like Stable Diffusion 1.5.

I like giving it weird, non-prompts, like lines from songs or novels. I then run it for a few hundred generations locally and doing stuff with the malformed shit it comes out with. I have a few art projects like this.

Aphex Twin vibes.

[+] cwoolfe|4 months ago|reply

ChatGPT is the only one I've found that can transform an image into a specified size. i.e. "resize this image to be 1280x1024 pixels"

[+] MrBuddyCasino|4 months ago|reply

Would have loved to see Grok (xAI) in there, by my (limited) experience it is often better than OpenAI or Gemini.

103 comments