maybe it was my prompt, but there seems to be far too much interpretation after the image embedding. In my examples it implicitly started to summarize parts of the text, unfortunately incorrectly. On an invoice with typed lettering it summarized that payments submitted would not post for 2-3 business days, when in reality the text said if you submitted after 2p on a friday, the payment would not post until the following monday. Which is significantly different. I'd be curious if you could ablate those layers in some way, because the one-shot structured text detection recognition was much better than vanilla ocr.
No comments yet.