top | item 41081043 (no title) hugodutka | 1 year ago PyMuPDF, a PDF library for Python. discuss order hn newest jimmySixDOF|1 year ago A different approach from vanilla OCR/parsing seems to be this mixed ColPali integrating a purposed small vision models and a ColBERT type indexing for retrieval. So - if search is the intended use case - it can skip the whole OCR step entirely.[1] https://huggingface.co/blog/manu/colpali
jimmySixDOF|1 year ago A different approach from vanilla OCR/parsing seems to be this mixed ColPali integrating a purposed small vision models and a ColBERT type indexing for retrieval. So - if search is the intended use case - it can skip the whole OCR step entirely.[1] https://huggingface.co/blog/manu/colpali
jimmySixDOF|1 year ago
[1] https://huggingface.co/blog/manu/colpali