Show HN: I created a PoC for live descriptions of the surroundings for the blind
73 points| o40 | 1 year ago |github.com
In this case the Ray-Ban Meta is getting accessibility features. The functionality is promising according to reviews, but requires the user to say "Hey meta what am I looking at" every time a scene is to be described. The battery life seem underwhelming as well.
It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it. This is where I got interested to see if I could create a solution using an ESP32 WiFi camera, and learn some arduino development in the process.
I managed to create a solution where the camera connects to the phone "personal hotspot", and publishes an image every 7 seconds to an online server, which then uses the gpt-4o-mini model to describe the image and update a web page, that is read back to the user using voice synthesis. The latency for this is less than 2 seconds, and is generally faster.
I am happy with the result and learnt a lot, but I think I will pause this project for now. At least until some shiny new tech emerges (cheaper open source camera glasses).
biosboiii|1 year ago
https://altayakkus.substack.com/p/you-wouldnt-download-an-ai
tr33house|1 year ago
miki123211|1 year ago
I don't see a point to this over just using a cell phone app to do this, which are slowly starting to appear.
o40|1 year ago
I have not done any app development, and for this project I wanted to keep some things simple to focus on what can be expected from a low quality camera in combination with AI for descriptions.
oulipo|1 year ago
Someone|1 year ago
I think you need to triple-check whether users actually find that nice.
Assuming that keeping the text limited to what interests the user will stay an unsolved problem for the foreseeable future, I guesspect that they prefer a middle ground where they aren’t continuously bombarded with text, but it’s easy to get that flow going. For example, having that text feed on only while a button is being held down.
I guesspect that because I think users would soon be fed up with an assistent that says there’s a painting on the wall or a church tower in the distance every time they turn their head.
Both can be useful information, but not when you hear them for the thousandth times while in your own house/garden.
o40|1 year ago
I wanted to create something opposite of needing to say "Hey Google, describe what is in front of me" or similar. Also a point was to see how cheap/simple you can go and still get valuable information.
nels|1 year ago
o40|1 year ago
tetrisgm|1 year ago
three2a88|1 year ago
https://www.youtube.com/watch?v=Wuntz3KDIAk
rkagerer|1 year ago
oniony|1 year ago
lionkor|1 year ago
rad_gruchalski|1 year ago
MrVandemar|1 year ago
o40|1 year ago
My hope is that there will be "cheap" camera glasses that you can use different services for image descriptions. There is a company called "Be My Eyes" that is developing an AI tool for image descriptions, which probably is miles better than anything I can come up with. https://www.bemyeyes.com/blog/introducing-be-my-ai
Be My Eyes seem to support Ray-Ban Meta glasses, so hopefully "Be My AI" will too.
I understand the "not consulting the target audience" all too well, for instance braille signs that are at eye-level and is hard to find. Some workplaces is very keen to make accessibility adjustments, but mostly if they are seen so that they can show others that adjustments have been done, regardless if they actually help or not.
xnx|1 year ago
smitty1e|1 year ago
pockybum522|1 year ago
unknown|1 year ago
[deleted]
rusty_venture|1 year ago
sajb|1 year ago