top | item 46630399

(no title)

sinandrei | 1 month ago

Anyone use these approaches with academic pdfs?

discuss

order

amelius|1 month ago

Anyone using them for electronics datasheets?

bradfa|1 month ago

I would like to. I haven't yet found a solution that works well.

The problems with datasheets is tables which span multiple pages, embedded images for diagrams and plots, they're generally PDFs, and only sometimes are they 2-column layout.

Converting from PDF to markdown while retaining tables correctly seems to work well for me with Mistral's latest OCR model, but this isn't an open model. Using docling with different models has produced much worse results.