top | item 46978276

(no title)

kergonath | 18 days ago

Interesting. What kind of layout do you have?

My documents have one or two-column layouts, often inconsistently across pages or even within a page (which tripped older layout detection methods). Most models seem to understand that well enough so they are good enough for my use case.

discuss

order

chaps|18 days ago

Documents that come from FOIA. So, some scanned, some not. Lots of forms and lots of hand writing to add info that the form format doesn't recognize. Lots of repeated documents, but lots of one-off documents that have high signal.

pogue|18 days ago

I'd be very curious what works well with FOIA historical documents that have been scanned by hand with redactions by markers & etc.