top | item 44321096

(no title)

nxrabl | 8 months ago

Very interesting! Is this the state of the art for accurate OCR of tabular PDFs, or is there other work in the space to compare against?

discuss

order

SnooSux|8 months ago

There's lots of posts on HN for developments and companies doing OCR and Document Extraction. It's a classic CV problem but still has come a long way in the past couple years

dwillis|8 months ago

Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.