top | item 45824142

(no title)

eeasss | 3 months ago

Are there any llms in particular that work best with g-evals?

discuss

order

lyuata|3 months ago

LLM Benchmark leaderboard for common evals sounds like a fun idea to me.

zlatkov|3 months ago

I haven’t come across any research showing that a specific LLM consistently outperforms others for this. It generally works best with strong reasoning models that produce consistent outputs.