Nano Banana image examples

[+] vunderba|6 months ago|reply

Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.

I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.

https://genai-showdown.specr.net/image-editing

Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.

[+] user_7832|6 months ago|reply

> a very high focus on adherence

Don't know if it's the same for others, but my issue with Nano Banana has been the opposite. Ask it to make x significant change, and it spits out what I would've sworn is the same image. Sometimes randomly and inexplicably it spits our the expected result.

Anyone else experiencing this or have solutions for avoiding this?

[+] tdalaa|6 months ago|reply

Great comparison! Bookmarked to follow. Keep an eye on Grok, they're improving at a very rapid rate and I suspect they'll be near the top in not too distant future.

[+] xnx|6 months ago|reply

Amazing model. The only limit is your imagination, and it's only $0.04/image.

Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation

Good collection of examples. Really weird to choose an inappropriate for work one as the second example.

[+] warkdarrior|6 months ago|reply

More specifically, Nano Banana is tuned for image editing: https://gemini.google/overview/image-generation

[+] smrtinsert|6 months ago|reply

Is it a single model or is it a pipeline of models?

[+] minimaxir|6 months ago|reply

[misread]

[+] plomme|6 months ago|reply

This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.

I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.

Am I using the wrong model, somehow??

[+] minimaxir|6 months ago|reply

I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg

Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.

However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)

[+] voidUpdate|6 months ago|reply

Well it's good to see they are showcasing examples where the model really fails too.

- The second one in case 2 doesn't look anything like the reference map

- The face in case 5 changes completely despite the model being instructed to not do that

- Case 8 ignores the provided pose reference

- Case 9 changes the car positions

- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is

- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much

- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing

- Case 33 just generated a generic football ground

- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)

- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much

Super nice to see how honest they are about the capabilities!

[+] neilv|6 months ago|reply

Unfortunately NSFW in parts. It might be insensitive to circulate the top URL in most US tech workplaces. For those venues, maybe you want to pick out isolated examples instead.

(Example: Half of Case 1 is an anime/manga maid-uniform woman lifting up front of skirt, and leaning back, to expose the crotch of underwear. That's the most questionable one I noticed. It's one of the first things a visitor to the top URL sees.)

[+] istjohn|6 months ago|reply

Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:

- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows

- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors

- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.

- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.

[+] darkamaul|6 months ago|reply

This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.

Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.

[+] mitthrowaway2|6 months ago|reply

I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.

[+] al_borland|6 months ago|reply

As someone who can’t visualize things like this in my head, and can only think about them intellectually, your own imagination is still special. When I heard people can do that, it sounded like a super power.

AI is like Batman, useless without his money and utility belt. Your own abilities are more like Superman, part of who you are and always with you, ready for use.

[+] lemonberry|6 months ago|reply

But you can find joy at things you envision, or laugh, or be horrified. The mental ability is surely impressive, but having a reason to do it and feeling something at the result is special.

"To see a world in a grain of sand And a heaven in a wild flower..."

We - humans - have reasons to be. We get to look at a sunset and think about the scattering of light and different frequencies and how it causes the different colors. But we can also just enjoy the beauty of it.

For me, every moment is magical when I take the time to let it be so. Heck, for there to even be a me responding to a you and all of the things that had to happen for Hacker News to be here. It's pretty incredible. To me anyway.

[+] FuckButtons|6 months ago|reply

I have aphantasia, I’m glad we’re all on a level playing field now.

[+] m3kw9|6 months ago|reply

To be fair, the model's ability came from us generating the training data.

[+] layer8|6 months ago|reply

The proof in the pudding will be if machines will be able to develop new art styles. For example, there is a progression in comic/manga/anime art styles over the decades. If humans would stop (they probably won't) that kind of progression, would machines be able to continue it? In principle yes (we are biological machines of sorts), but likely not with the current AI architecture.

[+] micromacrofoot|6 months ago|reply

it can only do this because it's been trained on millions of human works

[+] echelon|6 months ago|reply

Vision has evolved frequently and quickly in the animal kingdom.

Conscious intelligence has not.

As another argument, we've had mathematical descriptions of optics, drawing algorithms, fixed function pipeline, ray tracing, and so much more rich math for drawing and animating.

Smart, thinking machines? We haven't the faintest idea.

Progress on Generative Images >> LLMs

[+] EGreg|6 months ago|reply

Seriously? One could always cut-and-paste (not the computer term) a hairstyle over a photo of a person.

You are now marvelling at someone taking the collective output of humans around the world, then training a model on it with massive, massive compute… and then having a single human compete with that model.

Without the human output on the Internet, none of this would be possible. ImageNet was positively small compared to this.

But yeah, what you call “imagination” is basically perturbations and exploration across a model that you have in your head, which imposes constraints (eg gravity etc) that you learned. Obviously we can remix things now that they’re on the Internet.

Having said that, after all that compute, the models had trouble rendering clocks that show an arbitrary time, or a glass of wine filled to the brim.

[+] stuckkeys|6 months ago|reply

that was deep.

[+] kylebenzle|6 months ago|reply

[deleted]

[+] dbish|6 months ago|reply

Nano banana is great. Been using it for creating coloring books based off photos for my son and friends’ kids: https://github.com/dbish/bespoke-books-ai-example

Does a pretty good job (most of the time) of sticking to the black and white coloring book style while still bringing in enough detail to recognize the original photo in the output.

[+] foobarbecue|6 months ago|reply

Man, I hate this. It all looks so good, and it's all so incorrect. Take the heart diagram, for example. Lots of words that sort of sound cardiac but aren't ("ventricar," "mittic"), and some labels that ARE cardiac, but are in the wrong place. The scenes generated from topo maps look convincing, but they don't actually follow the topography correctly. I'm not looking forward to when search and rescue people start using this and plan routes that go off cliffs. Most people I know are too gullible to understand that this is a bullshit generator. This stuff is lethal and I'm very worried it will accelerate the rate at which the populace is getting stupider.

[+] rimmontrieu|6 months ago|reply

Impressive examples but for GenAI it always comes down to the fact that you have to cherry pick the best result after so many fail attempts. Right now, it feels like they're pushing the narrative that ExpectedOutput = LLM(Prompt, Input) when it's actually ExpectedOutput = LLM(Prompt, Input) * Takes where Takes can vary from 1 to 100 or more

[+] wu1064442747|5 months ago|reply

https://www.img2img.online/en

Nano Banana AI | Professional Image Editor & Generator | Nano Banana

Nano Banana AI editor powered by Google Gemini. Remove backgrounds, swap faces, create avatars. Professional nano banana image editing made simple.

[+] Animats|6 months ago|reply

I have two friends who are excellent professional graphic artists and I hesitate to send them this.

[+] metaphor|6 months ago|reply

My wife is a professional graphic artist and I sent it to her without hesitation...if only for the awareness.

[+] kertoip_1|6 months ago|reply

I think it might be the same as with programmers. It might look like AI Agents can do all the programming, but when you actually try to use it do do things it quickly turns out to be not so much reliable.

[+] raincole|6 months ago|reply

Given Case 16, they might switch to a career making scientific diagrams.

[+] SweetSoftPillow|6 months ago|reply

They better learn it today than tomorrow. Even though it's might be painful for some who does not like to learn new tools and explore new horizons.

[+] mustaphah|6 months ago|reply

In a side-by-side comparison with GPT-4o [1], they are pretty much on par.

[1] https://github.com/JimmyLv/awesome-nano-banana

[+] twaldecker|6 months ago|reply

One thing that couldn't be done is transparent background. The model just generates the pattern in the background. Not real alpha channel transparency. You can even see artifacts in the pattern.

[+] zahlman|6 months ago|reply

The training data is presumably full of examples of people using the pattern to indicate transparency (and explaining that they do so — like the input for 50!), and much less of people actually creating such images (if the training data even preserves the alpha channel in the first place).

I think a bigger problem is the "artifacts" you describe (worse than that sounds to me).

[+] lifthrasiir|6 months ago|reply

Yeah, mangled checkerboard patterns are common when prompted to "remove" the background. It can be worked around by generating multiple images with only the background color varying (e.g. black and white) and reconstructing the alpha channel from their difference, as the model generally prefers to just copy and paste when no other prompts override that preference.

[+] throwaway2037|6 months ago|reply

Does anyone else cringe when they see so many examples of sexualised young women? Literally, Case 1/B has a women lifting up her skirt to reveal her underwear. For an otherwise very impressive model, you are spoiling the PR with this kind of immature content. Sheesh. I guess that confirms it: I am a old grumpy man! I count 26 examples with young women, and 9 examples with men. The only thing missing was "Lena": https://en.wikipedia.org/wiki/Lenna

[+] yomismoaqui|6 months ago|reply

Sex drives technology (even if we don't like it)

VHS, online payments, video streaming... As the old song say it "the internet is porn"

[+] shermantanktop|6 months ago|reply

My first reaction was the same, before I even knew what these demos represented. And of course I too am a grumpy old man.

[+] unknown|6 months ago|reply

[deleted]

[+] GNaLVEre|6 months ago|reply

I had to scroll down way too long for someone to point this out. Its messed up how casually racialised all these image gen examples are towards young asian women.

[+] ants_everywhere|6 months ago|reply

wait until you learn what prehistoric sculptors spent their time carving

I read your comment before checking the site and then I saw case one was a child followed by a sexy maid and I thought "oh no dear god" before I realized they weren't combining them into a single image.

[+] krapp|6 months ago|reply

I mean, what do you think the most common application of AI image generation is going to be?

[+] unknown|6 months ago|reply

[deleted]

[+] HeartStrings|6 months ago|reply

[deleted]

[+] FearNotDaniel|6 months ago|reply

[deleted]

[+] eig|6 months ago|reply

While I think most of the examples are incredible...

...the technical graphics (especially text) is generally wrong. Case 16 is an annotated heart and the anatomy is nonsensical. Case 28 with the tallest buildings has the decent images, but has the wrong names, locations, and years.

[+] mohsen1|6 months ago|reply

I'm furnishing a new apartment and Nano Banana has been super useful for placing furniture I want to purchase in rooms to make a judgment if things will work for us or not. Take a picture of the room, feed Nano Banana with that picture and the product picture and ask it to place it in the right location. It can even imagine things at night or even add lamps with lights on. Super useful!

[+] _def|6 months ago|reply

This is gonna be a golden age for creative prototyping and memes, and absolutely horrible for information quality and trustworthiness of content.

248 comments