top | item 38014416

(no title)

elyase | 2 years ago

How does it compare to https://github.com/explodinggradients/ragas

discuss

tvalmetrics is similar to ragas for sure, and we really like ragas. tvalmetrics has structural differences as well as differences in the specific metrics when compared to ragas.

With regards to the metrics, we have an end to end metric that scores how well your RAG response matches a reference correct response called answer similarity score. Last I saw, ragas did not have a score like this as they focus on scoring RAG responses and context without a reference correct answer. We also have a retrieval k-recall score that involves comparing the relevance of the retrieved context to the relevance of the top k context chunks where k is larger than the number of retrieved context chunks. Retrieval k-recall is a good score for tuning how many context chunks your RAG system should retrieve. I do not believe ragas has a score like this.

Structurally, tvalmetrics does not use langchain, while ragas does use langchain. We chose not to use langchain for our LLM API calls to keep the code in the package clear and make it easy for a user to understand exactly how the LLM-assisted evaluation works for each metric. Of course, the drawback of not using langchain means that our package is not integrated with as many LLMs. Currently, we only support using Open AI chat completion models as LLM evaluators to calcaulate the metrics. We plan on adding support for additional LLM evaluators very soon.