top | item 44476339

Show HN: From Photos to Positions: Prototyping VLM-Based Indoor Maps

55 points| accurrent | 7 months ago |arjo129.github.io

Just a fun hack I did while bored over the weekend. My wife was busy shopping, it got me thinking that can VLMs solve the indoor location problem in a mall? Can I just show a VLM a map and an image and have it doa good enough job locating me? I hacked this P.O.C and it seems to work.

2 comments

order

rohanrao123|7 months ago

Pretty cool! It reminded me of this work from NVIDIA Research - https://nvidia-ai-iot.github.io/remembr where they used VLMs and RAG on top of a real robot to navigate the Voyager campus in Santa Clara. You also might like the new OpenAI o3 models and how well they can play GeoGuessr ;)

https://simonwillison.net/2025/Apr/26/o3-photo-locations, https://news.ycombinator.com/item?id=43835044, https://www.astralcodexten.com/p/testing-ais-geoguessr-geniu...

accurrent|7 months ago

Yep I've seen the NVidia research stuff. It's pretty cool.