top | item 41106444

(no title)

samuria | 1 year ago

Interesting, I wanted to do this for a personal use case (mostly learning), but with PDFs. What's tech stack? I have explored using the AWS AI tools, but it seems a bit overkill for what I want it to do.

discuss

order

lou1306|1 year ago

If the PDFS are textual or have OCR, then pdf2text from the Poppler suite ought to be enough? If not, add Tesseract/ocrmypdf to the pipeline?

tompec|1 year ago

Tech stack is a mix of serverless Laravel, with Cloudflare and AWS functions, and some Pinecone for vector storage. Still experimenting on a few things but don't want to over-engineer unless I know where I'm going.

stevenicr|1 year ago

Given that cloudflare spies on traffic and reports to multiple agencies on it's findings, perhaps a breakdown of the chain and the privacy implications of each block in the stack would be beneficial?