top | item 46980365

(no title)

> LLMs are awful at the spatial stuff

Could someone please elaborate on this? This is intriguing

discuss

In general, text isn’t a great medium for transmitting spatial info. That’s why it’s easy for a model to understand an image but hard for it to draw an SVG of that image.

int_19h|18 days ago

This is a big reason why SOTA models are trained multimodal these days. Even when you're using them for text, the knowledge they gain from images and video improves their world models.