top | item 42817137

(no title)

jonnycoder | 1 year ago

I am doing OCR on hundreds of PDFs using AWS Textract. It requires me to convert each page of the pdf to an image and then analyze the image and it works good for converting to markdown format (which requires custom code). I want to try using some vision models and compare how they do, for example Phi-3.5-vision-instruct.

discuss

order

No comments yet.