top | item 40989197

(no title)

nirga | 1 year ago

I think it depends on the use case and how you define hallucinations. We've seen our metrics perform well (=correlates with human feedback) for use cases like summarization, RAG question-answering pipeline, and entity extraction.

At the end of the day things like "answer relevancy" are pretty dichotomic in a sense that for a human evaluator it will be pretty clear whether an answer is answering a question or not.

I wonder if you can elaborate on why you claim that there's no ability to detect with any certainty hallucinations.

discuss

No comments yet.