top | item 42392153

(no title)

Small models don't "know" as much so they hallucinate more. They are better suited for generations that are based in a ground truth, like in a RAG setup.

A better comparison might be Flash 2.0 vs 4o-mini. Even then, the models aren't meant to have vast world knowledge, so benchmarking them on that isn't a great indicator of how they would be used in real-world cases.

discuss

ipsum2|1 year ago

Yes, it's not an apples to apples comparison. My point is the position it's at on the lmarena leaderboard is misplaced due to the hallucination issues.