(no title)
encrux | 1 month ago
The fact that a language model can „reason“ (in the LLM-slang meaning of the term) about 3D space is an interesting property.
If you give a text description of a scene and ask a robot to perform a peg in hole task, modern models are able to solve them fairly easily based on movement primitives. I implemented this on a UR robot arm back in 2023
The next logical step is, instead of having the model output text (code representing movement primitives), outputting tokens in action space. This is what models like pi0 are doing.
volkercraig|1 month ago
The latter part is interesting. I'm not sure how the performance of one of those would be once they are working well, but my naive gut feeling is that splitting the language part and the driving part into two delegates is cleaner, safer, faster and more predictable.
convolvatron|1 month ago
since this is a limited and continuous domain, its a far better one for neural training than natural language. I guess this notion that a language model should be used for 3d motion control is a real indicator about the level of thought going into some of these applications.