top | item 45935432

(no title)

Kinda genius to scale exoskeleton data collection with UMI grippers when most labs are chasing "general" VLMs / VLAs by training on human demonstration videos.

Imo the latter will be very useful for semantic planning and reasoning, but only after manipulation is solved.

A ballpark cost estimate -

- $10 to $20 hourly wages for the data collectors

- $100,000 to $200,000 per day for 10,000 hours of data

- ~1,500 to 2,500 data collectors doing 4 to 6 hours daily

- $750K to $1.25M on hardware costs at $500 per gripper

Fully loaded cost between $4M to $8M for 270,000 hours of data.

Not bad considering the alternatives.

For example, teleoperation is way less efficient - it's 5x-6x slower than human demos, and 2x-3x more expensive per hour of operator time. But could become feasible after low-level and mid-level manipulation and task planning is solved.

discuss

v9v|3 months ago

Not teleoperating can have certain disadvantages due to mismatches between how humans move vs. how robots move though. See here: https://evjang.com/2024/08/31/motors.html

ACCount37|3 months ago

Intuitively, yes. But is it really true in practice?

Thinking about it, I'm reminded of various "additive training" tricks. Teach an AI to do A, and then to do B, and it might just generalize that to doing A+B with no extra training. Works often enough on things like LLMs.

In this case, we use non-robot data to teach an AI how to do diverse tasks, and robot-specific data (real or sim) to teach an AI how to operate a robot body. Which might generalize well enough to "doing diverse tasks through a robot body".

blueblisters|3 months ago

The exoskeletons are instrumented to match the kinematics and sensor suite of the actual robot gripper. You can trivially train a model on human collected gripper data and replay it on the robot.