top | item 22066500

(no title)

I had a similar experience working with OCR for a similar project as OP ( https://www.askgoose.com ). What we realized was tesseract's training data was significantly different to what we were trying to transcribe (screenshots of conversations). The sparseness of text leads to bad transcriptions. One workaround is to do some image processing before hand so you can break the image down into data that looks similar to the train data used for tesseract.

discuss

No comments yet.