(no title)
jbondeson | 9 years ago
Document identification like this is unfortunately the "easy" (and it's not particularly easy to do real time) part. The next two steps involve 3D de-deformation since unlike a flatbed scanner you cannot assume the paper is actually completely flat -- imagine a previously folded page, etc.
I love this stuff as it is at a crossroads of a half dozen different disciplines. Lots of money to be had if this can be done is a really robust manner.
Edit:
A couple examples of why this gets really hairy really fast:
* You'll notice that all the documents are shown on a high contrast background (dark wood grain) without a lot of stark lighting. One of your first steps in edge detection and line identification is image segmentation to remove background from foreground and then start removing noise. If you have a white piece of paper on a white table, or a large lighting contrast (say from an open window casting daylight on half the page) it really wreaks havoc with the algorithms.
* Imagine you're trying to recognize a page from a text book in the middle of the book. The way the page lies you end up with non-rectangular pages (they curve due to the spine) which kills the hough line transformation (there are also hough circle algorithms, but you get the point) and the rectangle selection.
Omnipresent|9 years ago
prashnts|9 years ago
prashnts|9 years ago
In the contrast problem you mention there, I found (in a few samples that I tested with) that adaptive thresholding seem to be sufficiently good [0].
[0] I am using ``skimage.filters.threshold_adaptive`` for this.
jbondeson|9 years ago
On 3D deformation, you're officially in academic research land. Nearly all algorithms require you to have a solid guess as to what the aspect ratio of the target object is. Other algorithms use heuristics based upon what you expect to find on a page. One particularly fun algorithm used the baseline of text (I believe for that paper it was Arabic) and fit a high-order curve to it which was then reversed. Unfortunately I haven't seen a truly generic approach that doesn't require a implementation-specific input.
[1] Frankly my feeling is that RGB to grayscale is a mistake and holding back many of these algorithms
iamflimflam1|9 years ago
http://docs.opencv.org/2.4/doc/tutorials/features2d/feature_...