top | item 40054562

(no title)

lyapunova | 1 year ago

Sorry, but this is a lot of marketing for the same thing over and over again. I'm not against Aloha as an _affordable_ platform, but skimping on hardware is kind of a bug not a feature. Moreover it's not even _lowcost_, its BoM is still like 20k and collecting all the data is labor intensive and not cheap.

And if we're focusing on the idea, it has existed since the 1950s and they were doing it relatively well then:

https://www.youtube.com/watch?v=LcIKaKsf4cM

discuss

order

xg15|1 year ago

> skimping on hardware is kind of a bug not a feature.

I have to disagree here. Not for 20k, but if you could really build a robot arm out of basically a desk lamp, some servos and a camera and had some software to control it as precisely as this video claims it does, this would be a complete game changer. We'd probably see an explosion of attempts to automate all kind of everyday household tasks that are infeasible to automate cost-effectively today (folding laundry, cleaning up the room, cooking, etc)

Also, every self-respecting maker out there would probably try to build one :)

> And if we're focusing on the idea, it has existed since the 1950s and they were doing it relatively well then:

I don't quite understand how the video fits here. That's a manually operated robot arm. The point of Aloha is that it's fully controlled by software, right?

YeGoblynQueenne|1 year ago

If you want a robot that can fold your laundry, clean your room and cook, you need a lot more than cheap hardware. You need an autonomous agent (i.e. "an AI") that can guide the hardware to accomplish the task.

We're still very far from that and you certainly can't do that with ALOHA, in practice, despite what the videos may seem to show. For each of the few, discrete, tasks that you see in the videos, the robot arms have to be trained by demonstration (via teleoperation) and the end result is a system that can only copy the operator's actions with very little variation.

You can check this in the Mobile ALOHA paper on arxiv (https://arxiv.org/abs/2401.02117) where page 6 shows the six tasks the system has been trained to perform, and the tolerances in the initial setup. So e.g. in the shrimp cooking task, the initial position of the robot can vary by 10cm and the position of the implements by 2cm. If everything is not set up just so, the task will fail.

What all this means is that if you could assemble this "cheap" system you'd then have to train it by a few hundred demonstrations to fold your laundry, and maybe it could do it, probably not, and if you moved the washing machine or got a new one, you'd have to train all over again.

As to robots cleaning up your room and cooking, those are currently in the realm of science fiction, unless you're a zen ascetic living in an empty room and happy to eat beans on toast every day. Beans from a can, that is. You'll have to initialise the task by opening the can yourself, obviously. You have a toaster, right?

modeless|1 year ago

These videos are all autonomous. They didn't have that in the 1950s.

lyapunova|1 year ago

I can appreciate that, but also they are recording and replaying motor signals from specific teleoperation demonstrations. Something that _was_ possible in the 1950s. You might say that it is challenging to replay demonstrations well on lower-quality hardware. And so there is academic value in trying to make it work on worse hardware, but it would not be my goto solution for real industry problems. E.g. this is not a route I would fund for a startup, for example.

sashank_1509|1 year ago

I follow this space closely and I never saw the 1950 teleoperation video which literally blows my mind that people had this working in 1950. Now you just need to connect that to a transformer / diffusion and it will be able to perform that task autonomously maybe 80% of the time with 200+ demonstrations and close to 100% of the time with 1000+ demonstrations.

Aloha was not new, but it’s still good work because robotics researchers were not focused on this form of data collection. The issue was most people went into the simulation rabbit hole where they had to solve sim-to-real.

Others went into the VR handset and hand tracking idea, where you never got super precise manipulations and so any robots trained on that always showed choppy movement.

Others including OpenAI decided to go full reinforcement learning foregoing human demonstrations which had some decent results but after 6 months of RL on an arm farm led by Google and Sergey Levine, the results were underwhelming to say the least.

So yes it’s not like Aloha invented teleoperation, they demonstrated that using this mode of teleoperation you could collect a lot of data that can train autonomous robot policies easily and beat other methods which I think is a great contribution!

markisus|1 year ago

I’m not sure you can say that imitation learning has been under-researched in the past. Imitation learning has been tried before alongside RL. But it did not generalize well until the advent of generative diffusion models.