However I would strongly recommend picking only one of your "features", the indoor navigation. If I were you, I'd definitely try to build a business by concentrating only on indoor navigation!
Indoor navigation is a huge new area where all the big players are looking for possible partners/acquisitions right now! Overlay-based AR, and the measuring tape demo is a joke compared what you've shown in indoor navigation!
You really have a chance of making a successful company based only on the indoor navigation feature. Forget the pricing for now, just offer it as a free beta on both iOS and Android and try to get the word out as much as you can.
I know of several large companies that would be seriously interested in the measurement feature. Manually measuring the rooms in a house is seriously time consuming.
Looks very cool, but it feels like there's a big gap between Evaluation (online only, low API call limit) and Enterprise ("contact us") pricing models.
Not sure what market you're ultimately going for, but right now it seems to defeat the point of providing a nice simple API if it's only usable for either throwaway projects or by very large customers.
You make a great point, and we certainly don't want to exclude the in-between cases. We're still figuring out what those pricing tiers might look like, so if you've got an application that you're excited about, just let us know and we'll figure out a way to make it work!
These are usually implemented using Structure from Motion techniques, but more specifically in this case SLAM (Simultaneous Localization and Mapping). There are two sources of information: from vision and from the Inertial Measurement Units (IMU) on the phone.
For the vision part, you start by extracting interest points in all images (Harris keypoints, or SIFT, or similar), then you match them up by using local patch descriptors, (a reasonable implementation in OpenCV for example is the Lucas Kanade Optical Flow tracker) and once you have the correspondences you can estimate a relative 3D camera transformation that explains the motion. In this case the problem is hard because the depth of every point is unknown in addition to the camera transform.
For the IMU stream you can use the accelerometer and gyro in the camera which gives you an estimate for both linear and rotational acceleration. These can be integrated over time to get a reasonable guess for the camera transformations from one time point to another as well.
You combine the two guesses (from vision and from the phone inertial measurement units) into a best guess, and then combine that in addition with the best guess from 30 milliseconds ago to arrive at an evolving probability distribution of this best guess over time. Standard way would be something like a Kalman Filter.
Another issue is dealing with drift over time, as errors in estimation build up and if you're scanning the same area your model will start to drift. This requires something called "Loop Closure" which optimizes the camera matrices across the entire duration of scan and not only frame to frame. This is very computational intensive and hard to do online and without it scans for longer than few seconds will get progressively uglier and misaligned.
This stuff is super tricky to get right. Also, be skeptical of these demos because they are easy to can. It's fairly easy to get that one shot where it looks like it works, but in practice these are exceptionally fragile and very very difficult to get working. Though I'm impressed it seemed to work okay inside the mall -- with all the specular reflections from the floor. Though I'd guess that if anyone placed a foot into the field of vision (and made the environment geometry nonstatic) it would all break :) Good luck to the team though!
I'm fairly sure that the tech is based around "structure from motion". The API simultaneously estimates the position of the camera at each point in time, and the location of some reference points (blue circles in the vid).
Because the device has an accelerometer, it is even able to extract distances, not just relative distances. I'm actually surprised by this as I always assumed the accelerometer was too noisy to be of use for this.
I tried to do a similar thing myself, but the problem is technically very difficult. While a lot of research has been done on structure from motion, actually packaging it into a usable API is a big task
This technology is awesome! If it's half as impressive in real life as the demo suggests you have done a fantastic job creating some really innovative technology. I wish you the best of luck in turning it into a real business!
[+] [-] z-e-r-o|12 years ago|reply
However I would strongly recommend picking only one of your "features", the indoor navigation. If I were you, I'd definitely try to build a business by concentrating only on indoor navigation!
Indoor navigation is a huge new area where all the big players are looking for possible partners/acquisitions right now! Overlay-based AR, and the measuring tape demo is a joke compared what you've shown in indoor navigation!
You really have a chance of making a successful company based only on the indoor navigation feature. Forget the pricing for now, just offer it as a free beta on both iOS and Android and try to get the word out as much as you can.
Good luck!
[+] [-] junto|12 years ago|reply
[+] [-] chrisdevereux|12 years ago|reply
Not sure what market you're ultimately going for, but right now it seems to defeat the point of providing a nice simple API if it's only usable for either throwaway projects or by very large customers.
[+] [-] eaglej|12 years ago|reply
[+] [-] feniv|12 years ago|reply
[+] [-] karpathy|12 years ago|reply
For the vision part, you start by extracting interest points in all images (Harris keypoints, or SIFT, or similar), then you match them up by using local patch descriptors, (a reasonable implementation in OpenCV for example is the Lucas Kanade Optical Flow tracker) and once you have the correspondences you can estimate a relative 3D camera transformation that explains the motion. In this case the problem is hard because the depth of every point is unknown in addition to the camera transform.
For the IMU stream you can use the accelerometer and gyro in the camera which gives you an estimate for both linear and rotational acceleration. These can be integrated over time to get a reasonable guess for the camera transformations from one time point to another as well.
You combine the two guesses (from vision and from the phone inertial measurement units) into a best guess, and then combine that in addition with the best guess from 30 milliseconds ago to arrive at an evolving probability distribution of this best guess over time. Standard way would be something like a Kalman Filter.
Another issue is dealing with drift over time, as errors in estimation build up and if you're scanning the same area your model will start to drift. This requires something called "Loop Closure" which optimizes the camera matrices across the entire duration of scan and not only frame to frame. This is very computational intensive and hard to do online and without it scans for longer than few seconds will get progressively uglier and misaligned.
This stuff is super tricky to get right. Also, be skeptical of these demos because they are easy to can. It's fairly easy to get that one shot where it looks like it works, but in practice these are exceptionally fragile and very very difficult to get working. Though I'm impressed it seemed to work okay inside the mall -- with all the specular reflections from the floor. Though I'd guess that if anyone placed a foot into the field of vision (and made the environment geometry nonstatic) it would all break :) Good luck to the team though!
[+] [-] yetanotherphd|12 years ago|reply
Because the device has an accelerometer, it is even able to extract distances, not just relative distances. I'm actually surprised by this as I always assumed the accelerometer was too noisy to be of use for this.
I tried to do a similar thing myself, but the problem is technically very difficult. While a lot of research has been done on structure from motion, actually packaging it into a usable API is a big task
[+] [-] woah|12 years ago|reply
[+] [-] rmah|12 years ago|reply
[+] [-] cookingrobot|12 years ago|reply
[+] [-] dharma1|12 years ago|reply