(no title)
var_cw
|
8 months ago
The point is how much non-vision sensors vs pure vision, helps humans to be humans. Don't you think this point was proven by LLMs already that generalizability doesn't come from multi-modality but by scaling a single modality itself? And jepa is for sure designed to do a better job at that than an LLM. So no doubt about raw scaling + RL boost will kick-in highly predictable & specific robotic movements.
godelski|8 months ago
Also, LLMs these days aren't trained on just language
datameta|8 months ago
Could you expand on what you mean by this?