(no title)
toisanji | 3 months ago
There are actually a lot of people trying to figure out spatial intelligence, but those groups are usually in neuroscience or computational neuroscience. Here is a summary paper I wrote discussing how the entorhinal cortex, grid cells, and coordinate transformation may be the key: https://arxiv.org/abs/2210.12068 All animals are able to transform coordinates in real time to navigate their world and humans have the most coordinate representations of any known living animal. I believe human level intelligence is knowing when and how to transform these coordinate systems to extract useful information. I wrote this before the huge LLM explosion and I still personally believe it is the path forward.
Animats|3 months ago
Right. I was thinking about this back in the 1990s. That resulted in a years-long detour through collision detection, physically based animation, solving stiff systems of nonlinear equations, and a way to do legged running over rough terrain. But nothing like "AI". More of a precursor to the analytical solutions of the early Boston Dynamics era.
Work today seems to throw vast amounts of compute at the problem and hope a learning system will come up with a useful internal representation of the spatial world. It's the "bitter lesson" approach. Maybe it will work. Robotic legged locomotion is pretty good now. Manipulation in unstructured situations still sucks. It's amazing how bad it is. There are videos of unstructured robot manipulation from McCarthy's lab at Stanford in the 1960s. They're not that much worse than videos today.
I used to make the comment, pre-LLM, that we needed to get to mouse/squirrel level intelligence rather than trying to get to human level abstract AI. But we got abstract AI first. That surprised me.
There's some progress in video generation which takes a short clip and extrapolates what happens next. That's a promising line of development. The key to "common sense" is being able to predict what happens next well enough to avoid big mistakes in the short term, a few seconds. How's that coming along? And what's the internal world model, assuming we even know?
Earw0rm|3 months ago
A machine can infer the right (or expected) answer based on data, I'm not sure that the same is true for how living things navigate the physical world - the "right" answer, such as one exists for your squirrel, is arguably Darwinian: "whatever keeps the little guy alive today".
imtringued|3 months ago
https://www.youtube.com/watch?v=udPY5rQVoW0
This has been a thing for a while. It's actually a funny way to demonstrate model based control by replacing the controller with a human.
nosianu|3 months ago
"AI" is not based on physical real world data and models like our brain. Instead, we chose to analyze human formal (written) communication. ("formal": actual face to face communication has tons of dimensions adding to the text representation of what is said, from tone, speed to whole body and facial expressions)
Bio-brains have a model based on physical sensor data first and go from there, that's completely missing from "AI".
In hindsight, it's not surprising, we skipped that hard part (for now?). Working with symbols is what we've been doing with IT for a long time.
I'm not sure going all out on trying to base something on human intelligence, i.e. human neuro networks, is a winning move. I see it as if we had been trying to create airplanes that flap their wings. For one, human intelligence already exists, and when you lean back and manage to look at how we do on small and large problems from an outside perspective it has plenty of blind spots and disadvantages.
I'm afraid if we were to manage a hundred percent human level intelligence AI we will be disappointed. Sure, it will be able to do a lot, but in the end, nothing we don't already have.
Right now that would also just be the abstract parts, I think the "moving the body" physical parts in relation to abstract commands would be the far more interesting part, but since current AI is not about using physical sensor data at all, never mind combining it with the abstract stuff...
dingnuts|3 months ago
[deleted]
bonsai_spool|3 months ago
Yes, you and the Mosers who won the Nobel Prize all believe that grid cells are the key to animals understanding their position in the world.
https://www.nobelprize.org/prizes/medicine/2014/press-releas...
Marshferm|3 months ago
There's a whole giant gap between grid cells and intelligence.
hliyan|3 months ago
diamond559|3 months ago
juliangamble|3 months ago
I'll add to the discussion a 2018 Nature letter: "Vector-based navigation using grid-like representations in artificial agents" https://www.nature.com/articles/s41586-018-0102-6
and a 2024 Nature article "Modeling hippocampal spatial cells in rodents navigating in 3D environments" https://www.nature.com/articles/s41598-024-66755-x
And a simulation in Github from 2018 https://github.com/google-deepmind/grid-cells
People have been looking at spacial awareness in neurology for quite a while. (In terms of the timeframe of recent developments in LLMs).
imtringued|3 months ago
>3. Interactive: World models can output the next states based on input actions
>Finally, if actions and/or goals are part of the prompt to a world model, its outputs must include the next state of the world, represented either implicitly or explicitly. When given only an action with or without a goal state as the input, the world model should produce an output consistent with the world’s previous state, the intended goal state if any, and its semantic meanings, physical laws, and dynamical behaviors. As spatially intelligent world models become more powerful and robust in their reasoning and generation capabilities, it is conceivable that in the case of a given goal, the world models themselves would be able to predict not only the next state of the world, but also the next actions based on the new state.
That's literally just an RNN (not a transformer). An RNN takes a previous state and an input and produces a new state. If you add a controller on top, it is called model predictive control. The most extreme form I have seen is temporal difference model predictive control (TD-MPC). [0]
[0] https://arxiv.org/abs/2203.04955
ACCount37|3 months ago
Trying to copy biological systems 1:1 rarely works, and copying biological systems doesn't seem to be required either. CNNs are somewhat brain-inspired, but only somewhat, and LLMs have very little architectural similarity to human brain - other than being an artificial neural network.
This functional similarity of LLMs to the human brain doesn't come from reverse engineered details of how the human brain works - it comes from the training process.
Marshferm|3 months ago
byearthithatius|3 months ago
toisanji|3 months ago
Deep Mind also did a paper with grid cells a while ago: https://deepmind.google/blog/navigating-with-grid-like-repre...
unknown|3 months ago
[deleted]
porphyra|3 months ago
I mean she launched her whole career with imagenet so you can hardly blame her for thinking that way. But on the other hand, there's something bitter lesson-pilled about letting a model "figure out" spatial relationships just by looking at tons of data. And tbh the recent progress [1] of worldlabs.ai (Dr Fei Fei Li's startup) looks quite promising for a model that understands stuff including reflections and stuff.
[1] https://www.worldlabs.ai/blog/rtfm
godelski|3 months ago
As for reflections, I don't get that impression either. They seem extremely brittle to movement.
[0] http://0x0.st/K95T.png
diamond559|3 months ago
Marshferm|3 months ago