top | item 45332040

(no title)

remotely related, but I have yet to find a solution for page classification in a document for tables, i.e. a classifier that returns the index of pages containing tables in a document that is reliable

solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)

discuss

djoldman|5 months ago

In my experience, this task is incredibly difficult for generality.

Handcrafting based on the dataset is the only way to get high performance.