top | item 45993500

(no title)

eminence32 | 3 months ago

> Generate better visuals with more accurate, legible text directly in the image in multiple languages

Assuming that this new model works as advertised, it's interesting to me that it took this long to get an image generation model that can reliably generate text. Why is text generation in images so hard?

discuss

unknown|3 months ago

[deleted]

Filligree|3 months ago

It’s not necessarily harder than other aspects. However:

- It requires an AI that actually understands English, I.e. an LLM. Older, diffusion-only models were naturally terrible at that, because they weren’t trained on it.

- It requires the AI to make no mistakes on image rendering, and that’s a high bar. Mistakes in image generation are so common we have memes about it, and for all that hands generally work fine now, the rest of the picture is full of mistakes you can’t tell are mistakes. Entirely impossible with text.

Nano Banana Pro seems to somewhat reliably produce entire pictures without any mistakes at all.

tobr|3 months ago

As a complete layman, it seems obvious that it should be hard? Like, text is a type of graphic that needs to be coherent both in its detail and its large structure, and there’s a very small amount of variation that we don’t immediately notice as strange or flat out incorrect. That’s not true of most types of imagery.

DesertVarnish|3 months ago

Largely but not entirely a data problem; specifically poor captioning. High quality captioning makes such a big difference.