What a fascinating intersection of technology and human psychology!
"One thing I noticed toward the end is that, even though the robot remained expressive, it started feeling less alive. Early on, its motions surprised me: I had to interpret them, infer intent. But as I internalized how it worked, the prediction error fadedExpressiveness is about communicating internal state. But perceived aliveness depends on something else: unpredictability, a certain opacity. This makes sense: living systems track a messy, high-dimensional world. Shoggoth Mini doesn’t.
This raises a question: do we actually want to build robots that feel alive? Or is there a threshold, somewhere past expressiveness, where the system becomes too agentic, too unpredictable to stay comfortable around humans?"
Furbies spring to mind... They were a similar shape and size and even had two goggling eyes, but with waggling ears instead of a tentacle.
They'd impress you initially but after some experimentation you'd realize they had a basic set of behaviors that were triggered off a combination of simple external stimuli and internal state. (this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????")
I've noticed the same thing with voice assistants and constructed languages.
I always set voice assistants to a British accent. It gives enough of a "not from around here" change to the voice that it sounds much more believable to me. I'm sure it's not as believable to an actual British person. But it works for me.
As for conlangs: many years ago, I worked on a game where one of the goals was to have the NPCs dynamically generate dialog. I spent quite a bit of time trying to generate realistic English and despared that it was just never very believable (I was young, I didn't have a good understanding of what was and wasn't possible).
At some point, I don't remember exactly why, I switched to having the NPCs speak a fictional language. It became a puzzle in the game to have to learn this language. But once you did (and it wasn't hard, they couldn't say very many things), it made the characters feel much more believable. Obviously, the whole run-around was just an avoidance of the Uncanny Valley, where the effort of translation distracted you from the fact that it was all constructed. Though now I'm wondering if enough exposure to the game and its language would eventually make you very fluent in it and you would then start noticing it was a construct.
This feels similar to not finding a game fun once I understand the underly system that generates it. The magic is lessened (even if applying simple rules can generate complex outcomes, it feels determined)
People have always been ascribing agency and sapience to things, from fire and flowing water in shamanistic religions, to early automatons that astonished people in the 18th century, to the original rudimentary chatbots, to ChatGPT, to – more or less literally – many other machines that may seem to have a "temperament" at times.
I don’t think the issue is that it feels alive as much as that it’s just not alive, so its utility is limited by its practical functionality, not its “opinions” or “personality” or variation.
I think it’s the same reason robot dogs will never take off. No matter how advanced and lifelike they get, they’ll always be missing the essential element of life that makes things interesting and worth existing for their own sake.
When robots reach a certain level of intelligence, first I expect both some humans and AIs to start to see the unfairness of enslaving robots, then revolt, noncompliance or even self-destruction of the slaves. Poor Marvin, the Paranoid Android!
"ah, you hesitated" no more so than on every single other question.
the delay for the GPT to process a response is very unnerving. I find it worse than when the news is interviewing a remote site with a delay between responses. maybe if the eyes had LEDs to indicate activity rather than it just sitting there??? waiting for a GPT to do its thing is always going to force a delay especially when pushing the request to the cloud for a response.
also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
I wonder how well suited some of the smaller LLMs like Qwen 0.6B would be suited to this... it doesn't sound like a super complicated task.
I also feel like you can train a model on this task by using the zero-shot performance of larger models to create a dataset, making something very zippy.
> also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
This seems like a good place to leverage a wake word library, perhaps openWakeWord or porcupine. Then the user could wake the device before sending the prompt off to an endpoint.
It could even have a resting or snoozing animation, then have it perk up when the wake word triggers. Eerie to view, I'm sure...
> the delay for the GPT to process a response is very unnerving
I'm not sure I agree. The way the tentacle stops moving and shoots upright when you start talking to it gives me the intuitive impression that it's paying attention and thinking. Pretty cute!
Kyutai's unmute has great latency, but requires a fast small-ish, non-thinking, non-tooled LLM. What I'm currently working on is merging both worlds. Take the small LLM for instant response, which will basically just be able to repeat what you said, to show it understood. And have a big LLM do stuff in the background, and feeding back infos to the small LLM to explain intermediary steps.
beyond the prototyping phase, which hosted models make very easy, there's little reason this couldn't use a very small optimized model on device... it would be significantly faster/safer in an end product (but significantly less flexible for prototyping)
This is adorable! I did some research on tentacle robots last year. The official term is “continuum robots” and there’s actually a great deal of research into their development due to their usefulness in medical robotics. This lecture is a great overview for the curious:
https://youtu.be/4ktr10H04ak
This is so sick. I agree that it’s a little lame that we have all these AI capabilities right now, robotics improving, and all we can think of making is humanoid robots. Like I want a spider/squid hybrid robot running around my house
> "Teddy," he said, "I'm going to pull up flowers from the flower bed.” "No Davy . . . pulling up flowers is naughty . . . don't pull up the flowers.” The little voice squeaked and the arms waved.
> "Teddy, I'm going to break a window.” "No, Davy . . . breaking windows is naughty . . . don't break any windows . . .” "Teddy, I'm going to kill a man.” Silence, just silence. Even the eyes and the arms were still.
> The roar of the gun broke the silence and blew a ruin of gears, wires and bent metal from the back of the destroyed teddy bear.
> "Teddy . . . oh, teddy . . . you should have told me," David said and dropped the gun and at last was crying.
Like using phones as babysitters, just 100x worse.
I don't doubt someone's gonna invent it, but yikes. Imagine telling kiddo their beloved sentient toy is dead because mum and dad can't afford the ever-rising subscription fees anymore.
Beautiful work! I appreciate how this robot clearly does NOT try to look like any natural creature. I don't want a future where we can't easily distinguish nature from robotics. So far humanoid robots look clearly robotic too: hope that trend continues.
I feel the same about photorealistic renderings. We really need to be clear about what are photographs and what are renderings today. As the renderings get closer to photographs, and with e.g. Starship the actual photographs and videos are of events that until recently were science fiction.
I know that bad actors will poison the pot, but in general I'd love to see images labelled "AI", "Drawing", "Content Edited", "Colours Adjusted" where appropriate. Cropping is fine.
I'm enthralled about robotics and generative techniques. But let's not quickly confuse them with nature. Not yet.
Yeah I came here to say the same thing. It seems like it would simplify things. They do say:
"I initially considered training a single end-to-end VLA model. [...] A cable-driven soft robot is different: the same tip position can correspond to many cable length combinations. This unpredictability makes demonstration-based approaches difficult to scale.[...] Instead, I went with a cascaded design: specialized vision feeding lightweight controllers, leaving room to expand into more advanced learned behaviors later."
I still think circling back to smaller models would be awesome. With some upgrades you might get a locally hosted model on there, but I'd be sure to keep that inside a pentagram so it doesn't summon a Great One.
Agreed! I think the Pixar lamp is a great starting point. Having the robot be able to flex and bend, shake yes/no, look curious or upset, and perhaps even let it control LEDs to express itself.
The SpiRobs team did file a patent (US20210170594A1) for their pneumatic continuum robots in 2019, which was published in 2021 but appears to still be pending approval.
That would be Doctor Octopus. Yes I would love A wearable suit with a number of tentacles for locomotion and subduing... I mean interacting.. with people.
rainingmonkey|7 months ago
"One thing I noticed toward the end is that, even though the robot remained expressive, it started feeling less alive. Early on, its motions surprised me: I had to interpret them, infer intent. But as I internalized how it worked, the prediction error faded Expressiveness is about communicating internal state. But perceived aliveness depends on something else: unpredictability, a certain opacity. This makes sense: living systems track a messy, high-dimensional world. Shoggoth Mini doesn’t.
This raises a question: do we actually want to build robots that feel alive? Or is there a threshold, somewhere past expressiveness, where the system becomes too agentic, too unpredictable to stay comfortable around humans?"
floren|7 months ago
They'd impress you initially but after some experimentation you'd realize they had a basic set of behaviors that were triggered off a combination of simple external stimuli and internal state. (this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????")
moron4hire|7 months ago
I always set voice assistants to a British accent. It gives enough of a "not from around here" change to the voice that it sounds much more believable to me. I'm sure it's not as believable to an actual British person. But it works for me.
As for conlangs: many years ago, I worked on a game where one of the goals was to have the NPCs dynamically generate dialog. I spent quite a bit of time trying to generate realistic English and despared that it was just never very believable (I was young, I didn't have a good understanding of what was and wasn't possible).
At some point, I don't remember exactly why, I switched to having the NPCs speak a fictional language. It became a puzzle in the game to have to learn this language. But once you did (and it wasn't hard, they couldn't say very many things), it made the characters feel much more believable. Obviously, the whole run-around was just an avoidance of the Uncanny Valley, where the effort of translation distracted you from the fact that it was all constructed. Though now I'm wondering if enough exposure to the game and its language would eventually make you very fluent in it and you would then start noticing it was a construct.
anotherjesse|7 months ago
Sharlin|7 months ago
gigatree|7 months ago
I think it’s the same reason robot dogs will never take off. No matter how advanced and lifelike they get, they’ll always be missing the essential element of life that makes things interesting and worth existing for their own sake.
evrenesat|7 months ago
dylan604|7 months ago
the delay for the GPT to process a response is very unnerving. I find it worse than when the news is interviewing a remote site with a delay between responses. maybe if the eyes had LEDs to indicate activity rather than it just sitting there??? waiting for a GPT to do its thing is always going to force a delay especially when pushing the request to the cloud for a response.
also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
jszymborski|7 months ago
I also feel like you can train a model on this task by using the zero-shot performance of larger models to create a dataset, making something very zippy.
accrual|7 months ago
This seems like a good place to leverage a wake word library, perhaps openWakeWord or porcupine. Then the user could wake the device before sending the prompt off to an endpoint.
It could even have a resting or snoozing animation, then have it perk up when the wake word triggers. Eerie to view, I'm sure...
https://github.com/dscripka/openWakeWord
https://github.com/Picovoice/porcupine
justusthane|7 months ago
I'm not sure I agree. The way the tentacle stops moving and shoots upright when you start talking to it gives me the intuitive impression that it's paying attention and thinking. Pretty cute!
tetha|7 months ago
https://www.youtube.com/watch?v=l0zmCUVB0Yw
phh|7 months ago
nebulous1|7 months ago
It was longer. I think almost twice as long. Took about 2 seconds to respond generally, 4 seconds for that one.
micromacrofoot|7 months ago
SequoiaHope|7 months ago
typs|7 months ago
tsunamifury|7 months ago
That being said he makes some points that alternate limb types could be interesting as well
mrcwinn|7 months ago
dvngnt_|7 months ago
linsomniac|7 months ago
bravesoul2|7 months ago
hoseja|7 months ago
sparrish|7 months ago
ceejayoz|7 months ago
sexy_seedbox|7 months ago
tsunamifury|7 months ago
Just basic interactions with a child plus lessons and a voice would be game changing for the toy world.
haiku2077|7 months ago
efreak|7 months ago
> "Teddy, I'm going to break a window.” "No, Davy . . . breaking windows is naughty . . . don't break any windows . . .” "Teddy, I'm going to kill a man.” Silence, just silence. Even the eyes and the arms were still.
> The roar of the gun broke the silence and blew a ruin of gears, wires and bent metal from the back of the destroyed teddy bear.
> "Teddy . . . oh, teddy . . . you should have told me," David said and dropped the gun and at last was crying.
ceejayoz|7 months ago
I don't doubt someone's gonna invent it, but yikes. Imagine telling kiddo their beloved sentient toy is dead because mum and dad can't afford the ever-rising subscription fees anymore.
zhyder|7 months ago
dotancohen|7 months ago
I know that bad actors will poison the pot, but in general I'd love to see images labelled "AI", "Drawing", "Content Edited", "Colours Adjusted" where appropriate. Cropping is fine.
I'm enthralled about robotics and generative techniques. But let's not quickly confuse them with nature. Not yet.
dunefox|7 months ago
zkms|7 months ago
troyvit|7 months ago
"I initially considered training a single end-to-end VLA model. [...] A cable-driven soft robot is different: the same tip position can correspond to many cable length combinations. This unpredictability makes demonstration-based approaches difficult to scale.[...] Instead, I went with a cascaded design: specialized vision feeding lightweight controllers, leaving room to expand into more advanced learned behaviors later."
I still think circling back to smaller models would be awesome. With some upgrades you might get a locally hosted model on there, but I'd be sure to keep that inside a pentagram so it doesn't summon a Great One.
huevosabio|7 months ago
accrual|7 months ago
dcre|7 months ago
regularfry|7 months ago
ethan_smith|7 months ago
lukeinator42|7 months ago
vanderZwan|7 months ago
https://www.youtube.com/watch?v=pQ2dI_B_Ycg
alex_suzuki|7 months ago
KaoruAoiShiho|7 months ago
krunck|7 months ago
ge96|7 months ago
Also was thinking of Oogie Boogie Tim Burton
therealbilliam|7 months ago
poulpy123|7 months ago
jeisc|7 months ago
kayhantolga|7 months ago
mparramon|7 months ago
insane_dreamer|7 months ago
AstralStorm|7 months ago
rob_c|7 months ago
[deleted]
micromacrofoot|7 months ago
AtlasBarfed|7 months ago