top | item 39900759

(no title)

shekhar101 | 1 year ago

What's the name of the layout recorgniser model? I did not have a good experience extracting layout from tables, especially those without column boundaries (space instead of lines to demarcate boundaries)

discuss

order

mpeg|1 year ago

it's https://huggingface.co/InfiniFlow/deepdoc and the code for usage is in https://github.com/infiniflow/ragflow/blob/main/deepdoc/READ... – it took me a bit of trial and error to get it working

It seems to be a YOLOv8 fine-tune, I only did a couple tests but results were decent. Another model that is supposed to be fine tuned for borderless is https://huggingface.co/keremberke/yolov8m-table-extraction but I haven't had great results myself with it, but maybe worth a try for you.

thegeomaster|1 year ago

Here's a quick test to run: if you have Windows and MS Office, File->Open your PDF and report the results. You might be surprised at the layout extraction quality.