(no title)
pronoiac | 1 year ago
I'd also tried ocrit, which uses Apple's Vision framework for OCR, with some success - https://github.com/insidegui/ocrit
It's an ongoing, iterative process. I'll watch this thread with interest.
Some recent threads that might be helpful:
* https://news.ycombinator.com/item?id=42443022 - Show HN: Adventures in OCR
* https://news.ycombinator.com/item?id=43045801 - Benchmarking vision-language models on OCR in dynamic video environments - driscoll42 posted some stats from research
* https://news.ycombinator.com/item?id=43043671 - OCR4all
(Meaning, I have these browser tabs open, I haven't fully digested them yet)
lherron|1 year ago
https://news.ycombinator.com/item?id=42952605 - Ingesting PDFs and why Gemini 2.0 changes everything
kingkongjaffa|1 year ago
I can’t help but think a few amateur humans could have read the pdf with their eyes and written the markdown by hand if the OCR was a little sketchy.
pronoiac|1 year ago