Having been a member of the robot learning community both in grad school and now in industry, I'd actually like to rightfully attribute something here since it seems that TRI is (deservedly so, I will agree wholeheartely) receiving most of the praise:
The core of these advancements are powered by Diffusion Policy [1], which Prof. Shuran Song's lab at Columbia (before she moved recently to Stanford) developed and pioneered. I'd suggest everyone to view the original project website [2], it has a ton of amazing real world challenging experiments.
It was a community favorite for the Best Paper Award at the R:SS conference [3], this year. I remember our lab (and all other learning labs in our robotics department), absolutely dissecting this paper. I know of people who've entirely pivoted away from their projects involving behavior cloning/imitation learning, to this approach, which deals with multi-modal action spaces much more naturally than the aforementioned approaches.
Prof. Song is an absolute rockstar in robotics right now, with several wonderful approaches that scale elegantly to the real world, including IRP [4] (which won Best Paper at R:SS 2022), FlingBot [5], Scaling Up Distilling Down [6] and much more. I recommend checking out her lab website too.
To be fair, they do credit Professor Song and the paper you linked. TRI is also listed as a collaborator on the paper.
> Diffusion Policy: TRI and our collaborators in Professor Song’s group at Columbia University developed a new, powerful generative-AI approach to behavior learning. This approach, called Diffusion Policy, enables easy and rapid behavior teaching from demonstration.
You may as well credit the information theorists, mathematicians, and physicists who laid out the fundamentals that brought us here.
They died before hardware achieved their decades old visions. Not much of this work is net new description, moreso normalizing old descriptions with observation now that we can actually build the old ideas.
Cool to see Russ Tedrakes recent work! His online course Underactuated Robotics is a very good course to get a grasp on the complexities faced in robotics.
It's exciting to see someone with a bit more deeper knowledge than "Flex tape slap LLM on robotics" featured here, which is majority of Robot Learning work upvoted on HN.
There's more to it than just language learning to be solved before we can have proper embodied agents in the chaotic real world.
Thanks. In the video around 2:40, he describes it as a “kindergarten for robots”, that’s an interesting way to think about it. I wonder if it would be possible to crowdsource the training of new tasks with a standard training tech protocol? That way you bid on the task you want and someone who solves it gets a bounty and everyone benefits? The point is there’s a long tail of tasks and a centralized lab probably can’t do them all.
Google was doing something similar, and it was on HN about a month ago.[1]
I wonder how much force feedback they have. Is that big round squishy thing in the videos sort of like a big finger, with lots of pressure sensors? People have built area pressure sensors before, as far back as the 1980s, but nobody knew what to do with all that data back then. Today, too much sensor data is far less of a problem.
I once took a crack at this problem by equipping a robot arm with an end wrench. The idea was that it would feel around for a bolt head, get the wrench onto it, and turn. A 6 DOF force sensor is enough for that. But this was pre deep learning, and I didn't get very far. although I did build the wrench robot setup.
This looks impressive. Much more than even the Boston Dynamics demonstrations.
Flipping a pancake is extremely difficult because each pancake is different. I know that these videos must be cherry-picked but to be able to train a Robot to do this just by demonstrating feels like a massive leap.
Flipping a pancake was done in 2010. What looks impressive for humans is easy for robots and vice versa:
https://youtu.be/W_gxLKSsSIE?si=HDyNXe1Ys_eFXiVU
Another case in point: robot juggling was done in 1990s and to date we do not have a robot that can open any door reliably like a human. Kind of like Moravecs Paradox
And here I thought manual labor jobs were safe for a very long time. I really hope people at the policy level are thinking about what it looks like to have a world of people that don’t have any work to do.
This looks way better than PaLM-E because the robots they're using are more capable and the tasks much more complex. And they're doing the behaviors at the same speed a human does them while puppeteering the robot. The PaLM-E demonstrations were all shown in sped-up videos because they are agonizingly slow in reality.
This is getting pretty close to how I think we get to the general purpose humanoid robot. This is how I see it playing out:
- You have your Boston Dynamics style humanoid robot at the job site, lets say it's a bricklayer for the purposes of this example.
- You have a human somewhere offsite in an open room with an omnidirectional treadmill floor, and cameras and depth sensors positioned all around the room. They're wearing a Hollywood style motion capture suit and have a VR headset on so they can see what the humanoid robot sees through their cameras.
- The human then acts as they would on site, walking up to the pile of bricks, picking them up, placing them etc. The robot moves in real time on the job site, mimicking whatever action the human performs. I don't know if you'll need props to do this properly or if the muscle memory from years on the job will be enough for the humans to get the motions right.
- You log all the data. You then have someone watch through the video stream, labelling each action that is being performed.
- You run it all through a machine learning algorithm, until you get to the point where you can just send the architectural plan to the robot and essentially say "Build this wall for me".
we use the term "Large language models" because the entire world wide web, library of congress, etc. have produced a truly vast amount of written content such that LLMs have massively large datasets available for learning. That's what I understand to be the "large" part. We have an unbelievable amount of written content available from a huge amount of datasets (both public domain and more questionably public domain).
When this video refers to "large behavioral model", where's the "large" part? Where are they getting a similarly "large" amount of behavioral input data? It looks like they have a big lab with a few dozen people modeling behaviors. that's great but it's not like this number of people could produce as much content as all of digital written content.
This seems pretty cool. But I'm not clear how someone can be a (full-time) professor at MIT and also be a (full-time) vice president at TRI. I've seen this kind of two-job situation before but never understood how it's practical, unless the person works 70+ hours a week.
I would disagree. All of what we are seeing from this latest surge in AI is essentially jumped up predictive text. To get to C-3P0 there is a whole additional layer of Intelligence needed. C-3P0 can make plans and execute those plans. This latest wave cannot reason about the world, it does not know or understand the world it just assembles words (and here motions) in a way that we value. It is not planning anything.
We'll borrow the work from future to keep people busy. There will always be plenty of work. IlIn addition to organic vs other food, there will be stall of human grown vs machine grown food etc etc..
When first tractor arrived in my village, when grandfather joked that all the landless labourers will dies of hunger now since there won't be work for them. Manual ploughing reduced but a number of other work became routine. These days it's hard to find labour in my village (western UP).
"American manufacturers use far fewer robots than their competitors, in particular in Taiwan, South Korea and China" [1]. And specialized manufacturing is in a permanent skills shortage. More automation may boost employment and wages for blue-collar workers. Particularly if such kit enables new entrants to challenge incumbents.
There’s no real dignity in work that can easily be done by a robot. A lot of these jobs make people miserable anyway, maybe we shouldn’t be fighting so hard to keep them.
Might just take a while for it to be economical for lots of jobs. The amount of humans is increasing, the amount of natural resources, different story.
Most of you guys have deep and practical experience with robotics and robots, for me anything that a robot is demonstrating by doing is a magic and scary thing...now I am having mild paranoia due all these progress in LLM and now these activities done by robots...What future might roll in front of us? What are your opinions on that?
This will become increasingly normal, but it’ll take a while before massive impact (ie taking your and your friends’ jobs). I’m not sure of the timeline. Humans are slow to adapt, but in 20-30 years I think the pace will pick up.
I’m not too worried about the current generation, but my kids. Don’t know what to tell them TBH.
I guess it’ll all be fine though. We techies tend to have a paranoid streak which isn’t becoming.
"“Our research in robotics is aimed at amplifying people rather than replacing them,” said Gill Pratt, CEO of TRI and Chief Scientist for Toyota Motor Corporation."
Why do CEOs make public statements such as this when the goal is humanoid robots to replace human labor, particularly in countries with declining birthrates?
Cars with electric power steering has steering force sensors, as well as accelerometers, though cars generally don't have central computers - they're networked collection of feature computers in a topology somewhat like a sugar molecule.
I'm pretty ignorant of state of the art robotics and had assumed for years that approaches like this were used, e.g. by Boston Dynamics. Surprising to see that it's a new thing.
[+] [-] momofuku|2 years ago|reply
The core of these advancements are powered by Diffusion Policy [1], which Prof. Shuran Song's lab at Columbia (before she moved recently to Stanford) developed and pioneered. I'd suggest everyone to view the original project website [2], it has a ton of amazing real world challenging experiments.
It was a community favorite for the Best Paper Award at the R:SS conference [3], this year. I remember our lab (and all other learning labs in our robotics department), absolutely dissecting this paper. I know of people who've entirely pivoted away from their projects involving behavior cloning/imitation learning, to this approach, which deals with multi-modal action spaces much more naturally than the aforementioned approaches.
Prof. Song is an absolute rockstar in robotics right now, with several wonderful approaches that scale elegantly to the real world, including IRP [4] (which won Best Paper at R:SS 2022), FlingBot [5], Scaling Up Distilling Down [6] and much more. I recommend checking out her lab website too.
[1] - https://arxiv.org/abs/2303.04137
[2] - https://diffusion-policy.cs.columbia.edu/
[3] - https://roboticsconference.org/program/awards/
[4] - https://irp.cs.columbia.edu/
[5] - https://flingbot.cs.columbia.edu/
[6] - https://www.cs.columbia.edu/~huy/scalingup/
[+] [-] yellow_lead|2 years ago|reply
> Diffusion Policy: TRI and our collaborators in Professor Song’s group at Columbia University developed a new, powerful generative-AI approach to behavior learning. This approach, called Diffusion Policy, enables easy and rapid behavior teaching from demonstration.
[+] [-] prox|2 years ago|reply
[+] [-] snor|2 years ago|reply
[+] [-] tomp|2 years ago|reply
What makes it work so much better than alternatives mentioned above?
[+] [-] 3abiton|2 years ago|reply
[+] [-] mdonahoe|2 years ago|reply
Which lab are you referring to?
[+] [-] baybal2|2 years ago|reply
[deleted]
[+] [-] srsqsonyl|2 years ago|reply
They died before hardware achieved their decades old visions. Not much of this work is net new description, moreso normalizing old descriptions with observation now that we can actually build the old ideas.
[+] [-] anuvrat1|2 years ago|reply
[+] [-] NalNezumi|2 years ago|reply
It's exciting to see someone with a bit more deeper knowledge than "Flex tape slap LLM on robotics" featured here, which is majority of Robot Learning work upvoted on HN.
There's more to it than just language learning to be solved before we can have proper embodied agents in the chaotic real world.
[+] [-] neom|2 years ago|reply
https://www.youtube.com/watch?v=w-CGSQAO5-Q
[+] [-] aniken|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] Animats|2 years ago|reply
I wonder how much force feedback they have. Is that big round squishy thing in the videos sort of like a big finger, with lots of pressure sensors? People have built area pressure sensors before, as far back as the 1980s, but nobody knew what to do with all that data back then. Today, too much sensor data is far less of a problem.
I once took a crack at this problem by equipping a robot arm with an end wrench. The idea was that it would feel around for a bolt head, get the wrench onto it, and turn. A 6 DOF force sensor is enough for that. But this was pre deep learning, and I didn't get very far. although I did build the wrench robot setup.
[1] https://news.ycombinator.com/item?id=37167698
[+] [-] v9v|2 years ago|reply
[+] [-] treespace8|2 years ago|reply
Flipping a pancake is extremely difficult because each pancake is different. I know that these videos must be cherry-picked but to be able to train a Robot to do this just by demonstrating feels like a massive leap.
[+] [-] sashank_1509|2 years ago|reply
[+] [-] irtefa|2 years ago|reply
[+] [-] dclowd9901|2 years ago|reply
[+] [-] reportingsjr|2 years ago|reply
Very exciting times in robotics!
[+] [-] modeless|2 years ago|reply
[+] [-] rcarr|2 years ago|reply
- You have your Boston Dynamics style humanoid robot at the job site, lets say it's a bricklayer for the purposes of this example.
- You have a human somewhere offsite in an open room with an omnidirectional treadmill floor, and cameras and depth sensors positioned all around the room. They're wearing a Hollywood style motion capture suit and have a VR headset on so they can see what the humanoid robot sees through their cameras.
- The human then acts as they would on site, walking up to the pile of bricks, picking them up, placing them etc. The robot moves in real time on the job site, mimicking whatever action the human performs. I don't know if you'll need props to do this properly or if the muscle memory from years on the job will be enough for the humans to get the motions right.
- You log all the data. You then have someone watch through the video stream, labelling each action that is being performed.
- You run it all through a machine learning algorithm, until you get to the point where you can just send the architectural plan to the robot and essentially say "Build this wall for me".
[+] [-] zzzeek|2 years ago|reply
When this video refers to "large behavioral model", where's the "large" part? Where are they getting a similarly "large" amount of behavioral input data? It looks like they have a big lab with a few dozen people modeling behaviors. that's great but it's not like this number of people could produce as much content as all of digital written content.
[+] [-] BillSaysThis|2 years ago|reply
[+] [-] paulsutter|2 years ago|reply
Really - start thinking carefully about what you’re working on. Until now the new AI have been language only, not spatial. That’s over
[+] [-] stuckinhell|2 years ago|reply
[+] [-] lbussell|2 years ago|reply
[+] [-] seanr88|2 years ago|reply
[+] [-] belter|2 years ago|reply
[+] [-] jonplackett|2 years ago|reply
[+] [-] dilawar|2 years ago|reply
When first tractor arrived in my village, when grandfather joked that all the landless labourers will dies of hunger now since there won't be work for them. Manual ploughing reduced but a number of other work became routine. These days it's hard to find labour in my village (western UP).
[+] [-] JumpCrisscross|2 years ago|reply
"American manufacturers use far fewer robots than their competitors, in particular in Taiwan, South Korea and China" [1]. And specialized manufacturing is in a permanent skills shortage. More automation may boost employment and wages for blue-collar workers. Particularly if such kit enables new entrants to challenge incumbents.
[1] https://www.wsj.com/economy/american-labors-real-problem-it-...
[+] [-] valine|2 years ago|reply
[+] [-] fhd2|2 years ago|reply
[+] [-] Tade0|2 years ago|reply
[+] [-] seydor|2 years ago|reply
[+] [-] JoeAltmaier|2 years ago|reply
Now the can learn, quickly, perhaps a more dexterous robot with flexible digits etc will become the norm.
[+] [-] bman_kg|2 years ago|reply
[+] [-] SanderNL|2 years ago|reply
I’m not too worried about the current generation, but my kids. Don’t know what to tell them TBH.
I guess it’ll all be fine though. We techies tend to have a paranoid streak which isn’t becoming.
[+] [-] ChatGTP|2 years ago|reply
[+] [-] jdkee|2 years ago|reply
Why do CEOs make public statements such as this when the goal is humanoid robots to replace human labor, particularly in countries with declining birthrates?
[+] [-] mortureb|2 years ago|reply
[+] [-] nico|2 years ago|reply
Are there any cars out there that have something like a sense of touch and with it can sense the road or things that they crash into?
[+] [-] jpadkins|2 years ago|reply
For everything else isn't touch too late, especially at high speeds? the point is to avoid the crash.
[+] [-] Kerb_|2 years ago|reply
https://wellsve.com/products/electrical-lighting-body-system....
[+] [-] numpad0|2 years ago|reply
[+] [-] colordrops|2 years ago|reply