This is actually the thing I really desperately need. I'm routinely analyzing contracts that were faxed to me, scanned with monstrously poor resolution, wet signed, all kinds of shit. The big LLM providers choke on this raw input and I burn up the entire context window for 30 pages of text. Understandable evals of the quality of these OCR systems (which are moving wicked fast) would be helpful...And here's the kicker. I can't afford mistakes. Missing a single character or misinterpreting it could be catastrophic. 4 units vacant? 10 days to respond? Signature missing? Incredibly critical things. I can't find an eval that gives me confidence around this.
daveguy|18 days ago
kergonath|18 days ago
Isn’t this close to the error rate of human transcription for messy input, though? I seem to remember a figure in that ballpark. I think if your use case is this sensitive, then any transcription is suspicious.
coder543|18 days ago
But, as others said, if you can't afford mistakes, then you're going to need a human in the loop to take responsibility.
staticman2|18 days ago
I can feed it a multiple page PDF and tell it to convert it to markdown and it does this well. I don't need to load the pages one at a time as long as I use the PDF format. (This was tested on A.i. studio but I think the API works the same way).
HPsquared|18 days ago
unknown|18 days ago
[deleted]
unknown|18 days ago
[deleted]
renewiltord|18 days ago
chrsw|18 days ago
aliljet|18 days ago
cinntaile|18 days ago
kergonath|18 days ago
xyproto|18 days ago