Wow! The ability to ingest the "cross product" of data on the internet and in the real world is huge; I bet a lot of what LMs don't know yet lives in that space. This seems a lot more general-purpose than CLIP, so I'm hopeful for even more impressive downstream applications, eg robotics.
No comments yet.