top | item 44053454

(no title)

palashshah | 9 months ago

totally. the way i think about it (purely based on intuition) is that asking an LLM to do understanding + image generation is too complex for it to be effective. if we separate out the tasks into discrete steps, the evaluation becomes better, and the generation simply becomes instruction following.

discuss

order

No comments yet.