top | item 42593919

Show HN: I created a PoC for live descriptions of the surroundings for the blind

73 points| o40 | 1 year ago |github.com

The difference in cost between products that are developed as accessibility tools compared to consumer products is huge. One example is camera glasses where the accessibility product costs ~$3000 (Envision Glasses), and the consumer product costs ~$300 (Ray-Ban Meta).

In this case the Ray-Ban Meta is getting accessibility features. The functionality is promising according to reviews, but requires the user to say "Hey meta what am I looking at" every time a scene is to be described. The battery life seem underwhelming as well.

It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it. This is where I got interested to see if I could create a solution using an ESP32 WiFi camera, and learn some arduino development in the process.

I managed to create a solution where the camera connects to the phone "personal hotspot", and publishes an image every 7 seconds to an online server, which then uses the gpt-4o-mini model to describe the image and update a web page, that is read back to the user using voice synthesis. The latency for this is less than 2 seconds, and is generally faster.

I am happy with the result and learnt a lot, but I think I will pause this project for now. At least until some shiny new tech emerges (cheaper open source camera glasses).

25 comments

biosboiii|1 year ago

Check out my reverse-engineering/cracking of Microsoft's App just doing that, SeeingAI.

https://altayakkus.substack.com/p/you-wouldnt-download-an-ai

tr33house|1 year ago

This was a great read. At this point, any org should assume on-device models are public

miki123211|1 year ago

Blind person here.

I don't see a point to this over just using a cell phone app to do this, which are slowly starting to appear.

o40|1 year ago

Yes, apps for this is for sure the best solution. Hopefully something like "Be My AI" in combination with consumer products such as Ray-Ban Meta, where you can get descriptions without telling the world that you are requesting descriptions.

I have not done any app development, and for this project I wanted to keep some things simple to focus on what can be expected from a low quality camera in combination with AI for descriptions.

oulipo|1 year ago

Hi! Could you tell me what are your favorite devices / apps to get descriptions of scenery? Are you a coder? Would you point me to the best setup for coding for a blind person? Thanks

Someone|1 year ago

> It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it

I think you need to triple-check whether users actually find that nice.

Assuming that keeping the text limited to what interests the user will stay an unsolved problem for the foreseeable future, I guesspect that they prefer a middle ground where they aren’t continuously bombarded with text, but it’s easy to get that flow going. For example, having that text feed on only while a button is being held down.

I guesspect that because I think users would soon be fed up with an assistent that says there’s a painting on the wall or a church tower in the distance every time they turn their head.

Both can be useful information, but not when you hear them for the thousandth times while in your own house/garden.

o40|1 year ago

Yes, repeated information is not great in many cases. A more advanced system could possibly keep track of which information is new and which information is already known to the user.

I wanted to create something opposite of needing to say "Hey Google, describe what is in front of me" or similar. Also a point was to see how cheap/simple you can go and still get valuable information.

nels|1 year ago

Nice work! You may be interested in a paper that explored a similar concept and included some interesting ways of dealing with latency called WorldScribe: https://worldscribe.org.

o40|1 year ago

Very interesting. This in combination with something that "tracks" described object not needing to describe them again would be a game changer.

tetrisgm|1 year ago

Wonderful effort. Congrats and I hope this keeps developing forward.

three2a88|1 year ago

au revoir

https://www.youtube.com/watch?v=Wuntz3KDIAk

rkagerer|1 year ago

Interesting, I had no idea there were "Sight as a Service" offerings.

oniony|1 year ago

I love how the descriptions after the prompt was fixed now read like the descriptions of the scenes in the 1982 video game The Hobbit.

lionkor|1 year ago

I love abbreviations! Is it a point of care? A piece of crap? A proof of concept? All of them would work :)

rad_gruchalski|1 year ago

Proof of concept. Don’t pay at a POS.

MrVandemar|1 year ago

Did you consult with your target audience — ie. blind or low-vision people — before or during development?

o40|1 year ago

Yes. My partner is visually impaired so that is one of the reasons why I think this is interesting to investigate. The current solution is way to "janky" to actually use, but gives insight in the problem to solve.

My hope is that there will be "cheap" camera glasses that you can use different services for image descriptions. There is a company called "Be My Eyes" that is developing an AI tool for image descriptions, which probably is miles better than anything I can come up with. https://www.bemyeyes.com/blog/introducing-be-my-ai

Be My Eyes seem to support Ray-Ban Meta glasses, so hopefully "Be My AI" will too.

I understand the "not consulting the target audience" all too well, for instance braille signs that are at eye-level and is hard to find. Some workplaces is very keen to make accessibility adjustments, but mostly if they are seen so that they can show others that adjustments have been done, regardless if they actually help or not.

xnx|1 year ago

Neat. So this is like the free Google Lookout app but more emphasis on the scene than objects.

smitty1e|1 year ago

PoC means "point of care" in this context?

pockybum522|1 year ago

Proof of Concept

unknown|1 year ago

[deleted]

rusty_venture|1 year ago

Does it say "I am a lamp. I am a lamp."?

sajb|1 year ago

"You are likely to be eaten by a grue."