For reference, have you seen the Artificial Analysis Image Arena Leaderboard? They also show you two images from anonymized models (shows after you vote), and calculates crowdsourced ELO ratings.
Thanks - and no, I haven't seen this one. I like how they have the edit mode dashboard - show the original image + two edits; I was thinking about doing something like this.
I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.
Yeah I think that it's part of the issue with a single "squashed" comparative metric. Some users are going to grade higher based on the overall visual fidelity and others are going to value following the prompt.
For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.
EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)
vtail|8 hours ago
I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.
vunderba|8 hours ago
For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.
https://genai-showdown.specr.net
EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)