top | item 9557122

Navigation for the Visually Impaired Using a Google Tango RGB-D Tablet

18 points| DanAndersen | 11 years ago |dan.andersen.name | reply

7 comments

[+] DanAndersen|11 years ago|reply

This is a project I've been working on for the past couple of months, using the Project Tango tablet for a navigation system for people with visual disabilities.

It uses pose estimation and point cloud data to (1) build a chunk-based voxel environment of the user's surroundings, (2) render a set of depth maps surrounding the user, and (3) use the depth map and OpenAL to generate 3D audio that gives indications of where mapped obstacles are.

I don't have it at a state where folks can try it out, but I did do a writeup of my approach and wanted to share it.

Demonstration video (with quiet audio) here: https://youtu.be/EnNuDiJazBs

[+] dm2|11 years ago|reply

This is awesome, great job, I think you've just given humans echolocation.

If someone was given a similar device at an early age that was semi-permanently attached to them, would their brain possibly be able to create a map of the room?

There have been previous attempts but the Tango device didn't exist then so the hardware was bulky and usually required a backpack.

[+] DanAndersen|11 years ago|reply

I definitely think it would be possible. I find it interesting to think about eyesight in the same way -- even though an image is projected onto our retinas, there's not a little homunculus looking at our retinas to see the image; it gets translated into electrical signals that our brain interprets. There seems to be a great amount of plasticity in the brain that lets us remap senses and view tools as extensions of our bodies.

There has been some prior work on using depth cameras for navigation for the visually impaired. For example, a smart cane can detect objects beyond its reach and give haptic feedback. Microsoft Research did some work with putting the Kinect on a helmet and giving audio cues for navigation (http://research.microsoft.com/pubs/184208/VisionForTheBlind....). What I'm interested in is taking that sensory input and making it less immediate by giving it a memory -- letting it build up a picture of an environment rather than needing to point a device at something in order to know something about it.

One big issue is figuring out how to sonify depth information so it's useful. One simple approach is to do a sort of sweep across each frame from left to right, letting each row of an image correspond to a certain pitch. I don't think this is a good approach, as it seems very vision-oriented and is likely to sound just like noise. Maybe if someone was using it from birth, but for relatively fast training I doubt that approach. Other approaches do more interpretation -- Microsoft's work detected faces, walls, and floors, giving each a distinct sound for greater recognition.

[+] extra88|11 years ago|reply

Humans can already learn echolocation [1]. Still, there are many possibilities for machine-assisted perception/translation. I think the post correctly identifies finding good ways to aurally represent the information to be one of the challenges.

[1] http://en.wikipedia.org/wiki/Daniel_Kish