top | item 46307226

(no title)

sabareesh | 2 months ago

Watch out these model are hallucinating lot more https://artificialanalysis.ai/evaluations/omniscience?omnisc...

discuss

order

joecarpenter|2 months ago

Isn't it the opposite? From the link: Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.

Gemini 3 Flash scored +13 in the test, more correct answers than incorrect.

sabareesh|2 months ago

Nope lower is better compared to recent open ai models this is bad. I am looking at AA-Omniscience Hallucination Rate

nemonemo|2 months ago

One thing I don't understand is how come Gemini Pro seems much cheaper than Gemini Flash in the scatter graph.

andai|2 months ago

This model has the best score on that benchmark.

Edit: Huh... It does score highest in "Omniscience", but also very high in Hallucination Rate (where higher score is worse)...

sabareesh|2 months ago

this has one of the worse score in AA-Omniscience Hallucination Rate