top | item 45543806

(no title)

arvind_k | 4 months ago

At Zipphy, I worked on solving similar problems in on-prem environments — building an OCR + NLP + CV pipeline to generate spatial layouts and classify documents at scale.

One persistent challenge was generalizing across “wild” PDFs, especially multi-page tables.

Your mention of agentic OCR correction and semantic chunking really caught my attention. I’m curious — how did you architect those to stay consistent across diverse layouts without relying on massive rule sets?

discuss

No comments yet.