top | item 44655485

Kaggle Launches LLM Evals

9 points| antgoldbloom | 7 months ago |kaggle.com

4 comments

order

art82135|7 months ago

Curious how does it compare to Chat Arena?

meganrisdal|7 months ago

We love what Chatbot Arena is doing to innovate on evaluation paradigms. The challenge of evaluating GenAI warrants diverse approaches. What we're excited to do is: 1) give anyone access to infra to make evaluation more accessible to more developers and researchers; 2) drive more novel, diverse evals. https://arxiv.org/abs/2505.00612v2

benhamner|7 months ago

Can we add our own models or benchmarks?