top | item 43333251

(no title)

pvo50555 | 11 months ago

What differentiates this from Open WebUI? How did you design the RAG pipeline?

I had a project in the past where I had hundreds of PDF / HTML files of industry safety and fatality reports which I was hoping to simply "throw in" and use with Open WebUI, but I found it wasn't effective at this even in RAG mode. I wanted to ask it questions like "How many fatalities occurred in 2020 that involved heavy machinery?", but it wasn't able to provide such broad aggregate data.

discuss

order

phren0logy|11 months ago

I think this is a fundamental issue with naive RAG implementations: they aren't accurate enough for pretty much anything

kridsdale1|11 months ago

Ultimately, the quality of OCR on PDF is where we are bottlenecked as an industry. And not just in text characters but understanding and feeding to the LLM structured object relationships as we see in tables and graphs. Intuitive for a human, very error prone for RAG.