top | item 47209777

(no title)

vunderba | 8 hours ago

For reference, have you seen the Artificial Analysis Image Arena Leaderboard? They also show you two images from anonymized models (shows after you vote), and calculates crowdsourced ELO ratings.

https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...

discuss

vtail|8 hours ago

Thanks - and no, I haven't seen this one. I like how they have the edit mode dashboard - show the original image + two edits; I was thinking about doing something like this.

I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.

vunderba|8 hours ago

Yeah I think that it's part of the issue with a single "squashed" comparative metric. Some users are going to grade higher based on the overall visual fidelity and others are going to value following the prompt.

For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.

https://genai-showdown.specr.net

EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)