top | item 40054912

(no title)

I can appreciate that, but also they are recording and replaying motor signals from specific teleoperation demonstrations. Something that _was_ possible in the 1950s. You might say that it is challenging to replay demonstrations well on lower-quality hardware. And so there is academic value in trying to make it work on worse hardware, but it would not be my goto solution for real industry problems. E.g. this is not a route I would fund for a startup, for example.

discuss

modeless|1 year ago

They do not replay recorded motor signals. They use recorded motor signals only to train neural policies, which then run autonomously on the robot and can generalize to new instances of a task (such as the above video generalizing to an adult size sweater when it was only ever trained on child size polo shirts).

Obviously some amount of generalization is required to fold a shirt, as no two shirts will ever be in precisely the same configuration after being dropped on a table by a human. Playback of recorded motor signals could never solve this task.

adolph|1 year ago

> recorded motor signals only to train neural policies

Is interesting that they are using "Leader Arms" [0] to encode tasks instead of motion capture. Is it just a matter of reduced complexity to get off the ground? I suppose the task of mapping human arm motion to what a robot can do is tough.

0. https://www.trossenrobotics.com/widowx-aloha-set

YeGoblynQueenne|1 year ago

I appreciate that going from polo shirts to sweaters is a form of "generalisation" but that's only interesting because of the extremely limited capability for generalisation that systems have when they're trained by imitation learning, as ALOHA.

Note for example that all the shirts in the videos are oriented in the same direction, with the neck facing to the top of the video. Even then, the system can only straighten a shirt that lands with one corner folded under it after many failed attempts, and if you turned a shirt so that the neck faced downwards, it wouldn't be able to straighten it and hang it no matter how many times it tried. Let's not even talk about getting a shirt tangled in the arms themselves (in the videos, a human intervenes to free the shirt and start again). It's trained to straighten a shirt on the table, with the neck facing one way [1].

So the OP is very right. We're no nearer to real-world autonomy than we were in the '50s. The behaviours of the systems you see in the videos are still hard-coded, only they're hard-coded by demonstration, with extremely low tolerance for variation in tasks or environments, and they still can't do anything they haven't been painstakingly and explicitly shown how to do. This is a sever limitation and without a clear solution to it there's no autonomy.

On the other hand, ιδού πεδίον δόξης λαμπρόν, as we say in Greek. This is a wide open field full of hills to plant one's flag on. There's so much that robotic autonomy can't yet do that you can get Google to fund you if you can show a robot tying half a knot.

__________________

[1] Note btw that straightening the shirt is pointless: it will straighten up when you hang it. That's just to show the robot can do some random moves and arrive at a result that maybe looks meaningful to a human, but there's no way to tell whether the robot is sensing that it achieved a goal, or not. The straightening part is just a gimmick.

klowrey|1 year ago

We're building software for neuromorphic cameras specifically for robotics. If robots could actually understand motion in completely unconstrained situations, then both optimal control and modern ML techniques would easily see uplift in capability (i.e. things work great in simulation, but you can't get good positions and velocities accurately and at high enough rate in the real world). Robots already have fast, accurate motors, but their vision systems are like seeing the world through a strobe light.

ewjt|1 year ago

This is not preprogrammed replay. Replay would not be able handle even tiny variations in the starting positions of the shirt.

lyapunova|1 year ago

So, a couple things here.

It is true that replay in the world frame will not handle initial position changes for the shirt. But if the commands are in the frame of the end-effector and the data is object-centric, replay will somewhat generalize.(Please also consider the fact that you are watching the videos that have survived the "should I upload this?" filter.)

The second thing is that large-scale behavior cloning (which is the technique used here), is essentially replay with a little smoothing. Not bad inherently, but just a fact.

My point is that there was an academic contribution made back when the first aloha paper came out and they showed doing BC on low-quality hardware could work, but this is like the 4th paper in a row of sort of the same stuff.

Since this is YC, I'll add - As an academic (physics) turned investor, I would like to see more focus on systems engineering and first-principles thinking. Less PR for the sake of PR. I love robotics and really want to see this stuff take off, but for the right reasons.

johntb86|1 year ago

What do you mean by saying that they're replaying signals from teleoperation demonstrations? Like in https://twitter.com/DannyDriess/status/1780270239185588732, was someone demonstrating how to struggle to fold a shirt, then they put a shirt in the same orientation and had the robot repeat the same motor commands?

YeGoblynQueenne|1 year ago

Yeah, it's called imitation learning. Check out the Mobile ALOHA paper where they explain their approach in some detail:

https://arxiv.org/abs/2401.02117