top | item 44420928

(no title)

var_cw | 8 months ago

The point is how much non-vision sensors vs pure vision, helps humans to be humans. Don't you think this point was proven by LLMs already that generalizability doesn't come from multi-modality but by scaling a single modality itself? And jepa is for sure designed to do a better job at that than an LLM. So no doubt about raw scaling + RL boost will kick-in highly predictable & specific robotic movements.

discuss

godelski|8 months ago

  > LLMs already that generalizability

This is not a proven statement. In fact, it's pretty clear that they don't. They have some generalization but that's not enough for what you're inferring. The best way to show this is to carefully talk to an LLM about anything you have a lot of domain expertise in. Be careful to not give it answers (information leakage can sneak in subtly) and specifically look for those small subtle details (that's why it needs to be a topic you have expertise in). "The smell" will be right but the information won't.

Also, LLMs these days aren't trained on just language

datameta|8 months ago

> generalizability doesn't come from multi-modality but by scaling a single modality itself

Could you expand on what you mean by this?