(no title)
atonse | 6 days ago
I would argue that yes, we do use vision but we get that "lidar depth" from our stereo vision. And that used to be why I thought cameras weren't enough.
But then look at all the work with gaussian splatting (where you can take multiple 2d samples and build a 3d world out of it). So you could probably get 80% there with just that.
The ethos of many Musk companies (you'll hear this from many engineers that work there) is simplify, simplify, simplify. If something isn't needed, take it out. Question everything that might be needed.
To me, LIDAR is just one of those things in that general pattern of "if it isn't absolutely needed, take it out" – and the fact that FSD works so well without it proves that it isn't required. It's probably a nice to have, but maybe not required.
dymk|6 days ago
You're listening to the road and car sounds around you. You're feeling vibration on the road. You're feeling feedback on the steering wheel. You're using a combination of monocular and binocular depth perception - plus, your eyes are not a fixed focal length "cameras". You're moving your head to change the perspective you see the road at. Your inner ear is telling you about your acceleration and orientation.
kube-system|6 days ago
saltcured|6 days ago
However, there is also a lot of interaction between our perceptual system and cognition. Just for depth perception, we're doing a lot of temporal analysis. We track moving objects and infer distance from assumptions about scale and object permanence. We don't just repeatedly make depth maps from 2D imagery.
The brute-force approach is something like training visual language models (VLMs). E.g. you could train on lots of movies and be able to predict "what happens next" in the imaging world.
But, compared to LLMs, there is a bigger gap between the model and the application domain with VLMs. It may seem like LLMs are being applied to lots of domains, but most are just tiny variations on the same task of "writing what comes next", which is exactly what they were trained on. Unfortunately, driving is not "painting what comes next" in the same way as all these LLM writing hacks. There is still a big gap between that predictive layer, planning, and executing. Our giant corpus of movies does not really provide the ready-made training data to go after those bigger problems.
DesaiAshu|6 days ago
We often greatly underestimate / undervalue the role of our ears relative to vision. As my film director friend says, 80% of the impact in a movie is in the sound
IncreasePosts|4 days ago
dzhiurgis|6 days ago
wagwang|6 days ago
stefan_|6 days ago
Now you might say "use a depth model to estimate metric depth" and I think if you spend 5 minutes thinking about why a magic math box that pretends to recover real depth from a single 2D image is a very very sketchy proposition when you need it to be correct for emergency braking versus some TikTok bokeh filter you will see that also doesn't get you far.
servo_sausage|6 days ago
nindalf|6 days ago
Sufficient to build something close to human performance. But self driving cars will be held to a much higher standard by society. A standard only achievable by having sensors like LiDAR.
anthonypasq|6 days ago
Whether thats worth completely throwing away LiDAR is a different question, but your argument is just obviously false.
thfuran|6 days ago
BurningFrog|6 days ago
They also have several cameras all around providing constant 360° vision.
anon946|6 days ago
atonse|5 days ago
In fact, that's why radio/music/podcasts thrive. Because we're bored when we drive. We have conversations, etc. We daydream.
As long as the skills relevant to actually driving are on parity with humans, the rest doesn't matter.
In fact, in a recent podcast, Musk mused that you actually may have a limit of how smart you want a vehicle model to be, because what if IT starts to get bored? What will it do? I found that to be an interesting (and amusing) thought exercise.
atultw|6 days ago
maxdo|6 days ago
It's not only failing, it's causing false positives.
pbreit|6 days ago
thinkcontext|6 days ago
The reports that Tesla submits on Austin Robotaxis include several of them hitting fixed objects. This is the same behavior that has been reported on for prior versions of their software of Teslas not seeing objects, including for the incident for which they had a $250M verdict against them reaffirmed this past week. That this is occurring in an extensively mapped environment and with a safety driver on board leads me to the opposite conclusion that you have reached.