Maybe I am missing something obvious on the website, but where is the documentation? Where do you explain what each number mean, or at least a short overview of what the models are being tested on?
You can hover over some stuff, click on the model to get more info like tested categories, hover the correct test numbers to see some info about what they got wrong.
I just started on this, so currently adding more tests and I keep improving the UI. Let me know if you have any suggestions.
The ranking currently is mostly about the "smartest" model, which is most likely to respond correctly to any given question or request, regardless of the domain.
XCSme|4 days ago
I just started on this, so currently adding more tests and I keep improving the UI. Let me know if you have any suggestions.
The ranking currently is mostly about the "smartest" model, which is most likely to respond correctly to any given question or request, regardless of the domain.