(no title)
chaoz_ | 3 months ago
Text and languages contain structured information and encode a lot of real-world complexity (or it's "modelling" that).
Not saying we won't pivot to visual data or world simulations, but he was clearly not the type of person to compete with other LLM research labs, nor did he propose any alternative that could be used to create something interesting for end-users.
tarsinge|3 months ago
ACCount37|3 months ago
That whole take about the language being basically useless without a human mind to back it lost its legs in 2022.
In the meanwhile, what do those "world model" AIs do? Video generation? Meta didn't release anything like that. Robotics, self-driving? Also basically nothing from Meta there.
In the meanwhile, other companies are perfectly content with bolting multimodal transformers together for robotics tasks. Gemini Robotics being a research example - while modern Tesla FSD stack being a production grade one. Gemini even uses a language transformer as a key part of its stack.
unknown|3 months ago
[deleted]
KaiserPro|3 months ago
The issue is context. trying to make an AI assistant with just text only inputs is doeable but limiting. You need to know the _context_ of all the data, and without visual input most of it is useful.
For example "Where is the other half of this" is almost impossible to solve unless you have an idea of what "this" is.
but to do that you need to have cameras, to use cameras you need to have position, object, and people tracking. And that is a hard problem thats not solved.
the hypothesis is that "world models" solve that with an implicit understanding of the worl and the objects in context
ACCount37|3 months ago
But that sure didn't happen.