top | item 47201655

(no title)

Thanks for this!

Has there been any study of grammar and other word order effects in the result? Is "Dog fetches ball with tail" more likely to produce an image of dog with a ball grabbed with its tail than "tail ball dog fetch with"?

Like search engines, an issue is user searched for "best price on windows". Do they mean windows the OS or glass windows.

My impression, at least with image generation I've used, it's while there is some mapping of words and maybe phrases through the latent space to an image it's very weak. If you put "red ball" in a long prompt, it's nearly as likely "red" will get applied to some other part of the description than the ball.

discuss

whilefalse|15 hours ago

Honestly I don’t know the answer to that but it’s a good question and something interesting to look into. The PRX model I used ran pretty well on my MacBook M4 so you could play around, although I guess it will depend on the specifics of the model.

When I was building this I did have to rework the prompts quite a bit so they worked nicely with the word-by-word reveal visualisation, i.e. they mention the subject early, then add adjectives about setting and light etc.