top | item 41872735

(no title)

mawildoer | 1 year ago

It's interesting how it ignores things like headers and footers. LLMs have an edge there in "deciding" whether to include something in the output or not.

It'd be great if your hosted version would also accept a URL to a PDF and give a permalink to the result as well (if you're looking for upgrades)

discuss

order

themanmaran|1 year ago

I've noticed the same "deciding" what to include issues. Despite explicit instructions in the prompt to include all text on the page.

This is one of the items that can hopefully be resolved with fine tuning.

mawildoer|1 year ago

I thought it was a big upgrade. Comparing Zerox w/ Unstructured on the first 5 pages of [this datasheet](https://www.ti.com/lit/ds/symlink/lm5117.pdf); zerox gave me what I wanted, and Unstructured gave me a bunch of extra junk that was harder to sort through at the top