top | item 39840534

(no title)

Even starting at 30%, the MMLU graph is false. The four bars are wrong. Even their own 73,7% is not at the right height. The Mixtral 71.4% is below the 70% mark of the axis. This is really the kind of marketing trick that makes me avoid a provider / publisher. I can't build trust this way.

discuss

tylermw|1 year ago

I believe they are using the percentages as part of the height of the bar chart! I thought I'd seen every way someone could do dataviz wrong (particularly with a bar chart), but this one is new to me.

familiartime|1 year ago

That's really strange and incredibly frustrating - but slightly less so if it's consistent with all of the bars (including their own).

I take issue with their choice of bar ordering - they placed the lowest-performing model directly next to theirs to make the gap as visible as possible, and shoved the second-best model (Grok-1) as far from theirs as possible. Seems intentional to me. The more marketing tricks you pile up in a dataviz, the less trust I place in your product for sure.

pandastronaut|1 year ago

Interesting! It is probably one of the worst trick I have seen in a while for a bar graph. Never seen this one before. Trust vanishes instantly facing that kind of dataviz.

radicality|1 year ago

Wow, that is indeed a novel approach haha, took me a moment to even understand what you described since would never imagine someone plotting a bar chart like that.

occamrazor|1 year ago

It‘s more likely to be incompetence than malice: even their 73.7% is closer to 72% than to 74%.

nerpderp82|1 year ago

MMLU is not a good benchmark and needs to stop being used.

I can't find the section, but at the end of one of https://www.youtube.com/@aiexplained-official/videos he runs down a deep dive of the questions and answers in MMLU, and there are so many typos, omissions, and errors in the questions and the answers that it should no longer be used.

This is it, with the corret time offset into the video https://www.reddit.com/r/OpenAI/comments/18i02oe/mmlu_is_not...

The original longer complaint against MMLU https://www.youtube.com/watch?v=hVade_8H8mE

dskhudia|1 year ago

It’s an honest mistake in scaling the bars. It’s getting fixed soon. The percentages are correct though. In the process of converting excel chart to pretty graphs for the blog, scale got messed up.

tartrate|1 year ago

Seems fixed now