(no title)
jellyotsiro | 1 month ago
On LLMs vs traditional NLP: I hear you, and I've seen similar issues with LLM hallucination on structured data. That's why the architecture here is hybrid:
- Traditional exact regex/grep search for names, dates, identifiers - Vector search for semantic queries - LLM orchestration layer that must cite sources and can't generate answers without grounding
sebastiennight|1 month ago
"can't" seems like quite a strong claim. Would you care to elaborate?
I can see how one might use a JSON schema that enforces source references in the output, but there is no technique I'm aware of to constrain a model to only come up with data based on the grounding docs, vs. making up a response based on pretrained data (or hallucinating one) and still listing the provided RAG results as attached reference.
It feels like your "can't" would be tantamount to having single-handedly solved the problem of hallucinations, which if you did, would be a billion-dollar-plus unlock for you, so I'm unsure you should show that level of certainty.