Thanks for sharing so much detail. I am the CTO & Co-Founder of https://turbotable.ai (landing is outdated, will be updated soon), similar product in the space, but mainly focused on more general automation and data analysis for non-technical teams. OCR is one of the tools in our arsenal and our bet is that LLMs will get better at it. 2 limitations with this approach I can see:
- No reliable grounding, bounding box (for now)
- Context length (we have a solution for this, similar to Zerox by Omni)Even if in the long run foundation models will not solve OCR completely and reliably, we still have option to develop custom solutions or to integrate with mature players.
I’d love to connect with other founders as well.
No comments yet.