Benchmarks are basically straight up meaningless at this point in my experience. If they mattered and were the whole story, those Chinese open models would be stomping the competition right now. Instead they're merely decent when you use them in anger for real work.I'll withhold judgement until I've tried to use it.
phatfish|10 days ago
That sounds so broad that creating a meaningful benchmark is probably as difficult as creating an AI that actually "solves" those domains.
avereveard|10 days ago
girvo|10 days ago