(no title)
kurts_mustache | 2 years ago
1. The sign up with email just endlessly redirected, click link in email, ask to sign up with email, put in email, click link in email, etc.
2. Fine, I'll sign in with Google.
3. A PDF parser? Seriously that's what all this fuss is about? There are so many options already out there, PDFBox, iText, Unstructured, PyPDF, PDF.js, PdfMiner not to mention extraction services available from the hyperscalers. Super confused why anyone needs this.
verdverm|2 years ago
Specific to the parser, they do show where tools like those you mentioned fail and their LLM based parser captures the full data the aforementioned miss.
kurts_mustache|2 years ago
It's easy to cherry pick a PDF for marketing purposes and claim you're better. I didn't miss it, I just don't believe marketing announcements at face value. I tried their parser on a PDF with a bit of complex formatting like multiple columns, tables and a couple images and it choked, spitting out one big markdown header with jumbled text. Not impressed.
cheesyFishes|2 years ago
Planning broader releases in the future for sure.
asukla|2 years ago
Great work by llamaindex team. Also feel free to try https://github.com/nlmatics/llmsherpa which takes into account some of the things I mentioned.