Toyota Research claims breakthrough in teaching robots new behaviors

[+] momofuku|2 years ago|reply

Having been a member of the robot learning community both in grad school and now in industry, I'd actually like to rightfully attribute something here since it seems that TRI is (deservedly so, I will agree wholeheartely) receiving most of the praise:

The core of these advancements are powered by Diffusion Policy [1], which Prof. Shuran Song's lab at Columbia (before she moved recently to Stanford) developed and pioneered. I'd suggest everyone to view the original project website [2], it has a ton of amazing real world challenging experiments.

It was a community favorite for the Best Paper Award at the R:SS conference [3], this year. I remember our lab (and all other learning labs in our robotics department), absolutely dissecting this paper. I know of people who've entirely pivoted away from their projects involving behavior cloning/imitation learning, to this approach, which deals with multi-modal action spaces much more naturally than the aforementioned approaches.

Prof. Song is an absolute rockstar in robotics right now, with several wonderful approaches that scale elegantly to the real world, including IRP [4] (which won Best Paper at R:SS 2022), FlingBot [5], Scaling Up Distilling Down [6] and much more. I recommend checking out her lab website too.

[1] - https://arxiv.org/abs/2303.04137

[2] - https://diffusion-policy.cs.columbia.edu/

[3] - https://roboticsconference.org/program/awards/

[4] - https://irp.cs.columbia.edu/

[5] - https://flingbot.cs.columbia.edu/

[6] - https://www.cs.columbia.edu/~huy/scalingup/

[+] yellow_lead|2 years ago|reply

To be fair, they do credit Professor Song and the paper you linked. TRI is also listed as a collaborator on the paper.

> Diffusion Policy: TRI and our collaborators in Professor Song’s group at Columbia University developed a new, powerful generative-AI approach to behavior learning. This approach, called Diffusion Policy, enables easy and rapid behavior teaching from demonstration.

[+] prox|2 years ago|reply

Interesting that we have a genius robotics Dr. Song for real vs Star Treks Dr. Soong :)

[+] snor|2 years ago|reply

It should be noted that Diffusion Policy (not to mention IRP) was also apparently joint work with TRI.

[+] tomp|2 years ago|reply

Can anyone ELI5 (well, or, "explain like I'm someone who understands how autoencoders, transformers & convolutional networks work") diffusion?

What makes it work so much better than alternatives mentioned above?

[+] 3abiton|2 years ago|reply

It seems some researchers in her lab were also involved with Toyota.

[+] mdonahoe|2 years ago|reply

> our lab

Which lab are you referring to?

[+] baybal2|2 years ago|reply

[deleted]

[+] srsqsonyl|2 years ago|reply

You may as well credit the information theorists, mathematicians, and physicists who laid out the fundamentals that brought us here.

They died before hardware achieved their decades old visions. Not much of this work is net new description, moreso normalizing old descriptions with observation now that we can actually build the old ideas.

[+] anuvrat1|2 years ago|reply

For anyone interested, here[0] is the YouTube channel of Russ Tedrake, which has:

    - 6.4210 (2023) Robotics Manipulation
    - 6.8210 (2023) Underactuated Robotics

[0](https://www.youtube.com/@underactuated5171)

[+] NalNezumi|2 years ago|reply

Cool to see Russ Tedrakes recent work! His online course Underactuated Robotics is a very good course to get a grasp on the complexities faced in robotics.

It's exciting to see someone with a bit more deeper knowledge than "Flex tape slap LLM on robotics" featured here, which is majority of Robot Learning work upvoted on HN.

There's more to it than just language learning to be solved before we can have proper embodied agents in the chaotic real world.

[+] neom|2 years ago|reply

Website isn't loading for me, but found the video on the Toyota research youtube:

https://www.youtube.com/watch?v=w-CGSQAO5-Q

[+] aniken|2 years ago|reply

Thanks. In the video around 2:40, he describes it as a “kindergarten for robots”, that’s an interesting way to think about it. I wonder if it would be possible to crowdsource the training of new tasks with a standard training tech protocol? That way you bid on the task you want and someone who solves it gets a bounty and everyone benefits? The point is there’s a long tail of tasks and a centralized lab probably can’t do them all.

[+] unknown|2 years ago|reply

[deleted]

[+] Animats|2 years ago|reply

Google was doing something similar, and it was on HN about a month ago.[1]

I wonder how much force feedback they have. Is that big round squishy thing in the videos sort of like a big finger, with lots of pressure sensors? People have built area pressure sensors before, as far back as the 1980s, but nobody knew what to do with all that data back then. Today, too much sensor data is far less of a problem.

I once took a crack at this problem by equipping a robot arm with an end wrench. The idea was that it would feel around for a bolt head, get the wrench onto it, and turn. A 6 DOF force sensor is enough for that. But this was pre deep learning, and I didn't get very far. although I did build the wrench robot setup.

[1] https://news.ycombinator.com/item?id=37167698

[+] v9v|2 years ago|reply

The squishy thing is essentially an inflated balloon with a camera inside, which monitors the balloon's deformations: https://punyo.tech

[+] treespace8|2 years ago|reply

This looks impressive. Much more than even the Boston Dynamics demonstrations.

Flipping a pancake is extremely difficult because each pancake is different. I know that these videos must be cherry-picked but to be able to train a Robot to do this just by demonstrating feels like a massive leap.

[+] sashank_1509|2 years ago|reply

Flipping a pancake was done in 2010. What looks impressive for humans is easy for robots and vice versa: https://youtu.be/W_gxLKSsSIE?si=HDyNXe1Ys_eFXiVU Another case in point: robot juggling was done in 1990s and to date we do not have a robot that can open any door reliably like a human. Kind of like Moravecs Paradox

[+] irtefa|2 years ago|reply

Yes! In layman's terms: is the most efficient way to train these robots by showing them billions of videos of how it's done?

[+] dclowd9901|2 years ago|reply

And here I thought manual labor jobs were safe for a very long time. I really hope people at the policy level are thinking about what it looks like to have a world of people that don’t have any work to do.

[+] reportingsjr|2 years ago|reply

This sounds similar to some of the work that google has done, such as PaLM-E: https://blog.research.google/2023/03/palm-e-embodied-multimo...

Very exciting times in robotics!

[+] modeless|2 years ago|reply

This looks way better than PaLM-E because the robots they're using are more capable and the tasks much more complex. And they're doing the behaviors at the same speed a human does them while puppeteering the robot. The PaLM-E demonstrations were all shown in sped-up videos because they are agonizingly slow in reality.

[+] rcarr|2 years ago|reply

This is getting pretty close to how I think we get to the general purpose humanoid robot. This is how I see it playing out:

- You have your Boston Dynamics style humanoid robot at the job site, lets say it's a bricklayer for the purposes of this example.

- You have a human somewhere offsite in an open room with an omnidirectional treadmill floor, and cameras and depth sensors positioned all around the room. They're wearing a Hollywood style motion capture suit and have a VR headset on so they can see what the humanoid robot sees through their cameras.

- The human then acts as they would on site, walking up to the pile of bricks, picking them up, placing them etc. The robot moves in real time on the job site, mimicking whatever action the human performs. I don't know if you'll need props to do this properly or if the muscle memory from years on the job will be enough for the humans to get the motions right.

- You log all the data. You then have someone watch through the video stream, labelling each action that is being performed.

- You run it all through a machine learning algorithm, until you get to the point where you can just send the architectural plan to the robot and essentially say "Build this wall for me".

[+] zzzeek|2 years ago|reply

we use the term "Large language models" because the entire world wide web, library of congress, etc. have produced a truly vast amount of written content such that LLMs have massively large datasets available for learning. That's what I understand to be the "large" part. We have an unbelievable amount of written content available from a huge amount of datasets (both public domain and more questionably public domain).

When this video refers to "large behavioral model", where's the "large" part? Where are they getting a similarly "large" amount of behavioral input data? It looks like they have a big lab with a few dozen people modeling behaviors. that's great but it's not like this number of people could produce as much content as all of digital written content.

[+] BillSaysThis|2 years ago|reply

This seems pretty cool. But I'm not clear how someone can be a (full-time) professor at MIT and also be a (full-time) vice president at TRI. I've seen this kind of two-job situation before but never understood how it's practical, unless the person works 70+ hours a week.

[+] paulsutter|2 years ago|reply

It's on

Really - start thinking carefully about what you’re working on. Until now the new AI have been language only, not spatial. That’s over

[+] stuckinhell|2 years ago|reply

This seems like world changing technology, I can't believe these robots can learn complex motions just by watching a teacher.

[+] lbussell|2 years ago|reply

If I had to guess, we’re less than 5 years away from seeing real-life C-3PO.

[+] seanr88|2 years ago|reply

I would disagree. All of what we are seeing from this latest surge in AI is essentially jumped up predictive text. To get to C-3P0 there is a whole additional layer of Intelligence needed. C-3P0 can make plans and execute those plans. This latest wave cannot reason about the world, it does not know or understand the world it just assembles words (and here motions) in a way that we value. It is not planning anything.

[+] belter|2 years ago|reply

"Visuomotor Policy Learning via Action Diffusion" - https://news.ycombinator.com/item?id=37581866

[+] jonplackett|2 years ago|reply

OK, so those nice safe real world jobs? AI is after them too now…

[+] dilawar|2 years ago|reply

We'll borrow the work from future to keep people busy. There will always be plenty of work. IlIn addition to organic vs other food, there will be stall of human grown vs machine grown food etc etc..

When first tractor arrived in my village, when grandfather joked that all the landless labourers will dies of hunger now since there won't be work for them. Manual ploughing reduced but a number of other work became routine. These days it's hard to find labour in my village (western UP).

[+] JumpCrisscross|2 years ago|reply

> those nice safe real world jobs?

"American manufacturers use far fewer robots than their competitors, in particular in Taiwan, South Korea and China" [1]. And specialized manufacturing is in a permanent skills shortage. More automation may boost employment and wages for blue-collar workers. Particularly if such kit enables new entrants to challenge incumbents.

[1] https://www.wsj.com/economy/american-labors-real-problem-it-...

[+] valine|2 years ago|reply

There’s no real dignity in work that can easily be done by a robot. A lot of these jobs make people miserable anyway, maybe we shouldn’t be fighting so hard to keep them.

[+] fhd2|2 years ago|reply

Might just take a while for it to be economical for lots of jobs. The amount of humans is increasing, the amount of natural resources, different story.

[+] Tade0|2 years ago|reply

Populations of middle and high income countries are aging so there aren't that many takers for these jobs anyway.

[+] seydor|2 years ago|reply

people were hoping it would replace labor intensive jobs, instead ai seems to replace the white collars first

[+] JoeAltmaier|2 years ago|reply

Pretty fumble-fingered robots they were using. All you needed for industry up til now.

Now the can learn, quickly, perhaps a more dexterous robot with flexible digits etc will become the norm.

[+] bman_kg|2 years ago|reply

Most of you guys have deep and practical experience with robotics and robots, for me anything that a robot is demonstrating by doing is a magic and scary thing...now I am having mild paranoia due all these progress in LLM and now these activities done by robots...What future might roll in front of us? What are your opinions on that?

[+] SanderNL|2 years ago|reply

This will become increasingly normal, but it’ll take a while before massive impact (ie taking your and your friends’ jobs). I’m not sure of the timeline. Humans are slow to adapt, but in 20-30 years I think the pace will pick up.

I’m not too worried about the current generation, but my kids. Don’t know what to tell them TBH.

I guess it’ll all be fine though. We techies tend to have a paranoid streak which isn’t becoming.

[+] ChatGTP|2 years ago|reply

Star Wars ?

[+] jdkee|2 years ago|reply

"“Our research in robotics is aimed at amplifying people rather than replacing them,” said Gill Pratt, CEO of TRI and Chief Scientist for Toyota Motor Corporation."

Why do CEOs make public statements such as this when the goal is humanoid robots to replace human labor, particularly in countries with declining birthrates?

[+] mortureb|2 years ago|reply

Softening the blow. No one wants to hear a CEO say the robots are coming for their jobs.

[+] nico|2 years ago|reply

I wonder if giving cars a sense of touch is what will ultimately be the key to enable full autonomous driving

Are there any cars out there that have something like a sense of touch and with it can sense the road or things that they crash into?

[+] jpadkins|2 years ago|reply

I think cars already have a sense of touch with the road with traction control sensors.

For everything else isn't touch too late, especially at high speeds? the point is to avoid the crash.

[+] Kerb_|2 years ago|reply

Most cars are able feel when they crash into things. Usually that means it's too late to react to it.

https://wellsve.com/products/electrical-lighting-body-system....

[+] numpad0|2 years ago|reply

Cars with electric power steering has steering force sensors, as well as accelerometers, though cars generally don't have central computers - they're networked collection of feature computers in a topology somewhat like a sugar molecule.

[+] colordrops|2 years ago|reply

I'm pretty ignorant of state of the art robotics and had assumed for years that approaches like this were used, e.g. by Boston Dynamics. Surprising to see that it's a new thing.

243 comments