Yes. Just because the model is smaller doesn't always mean by default it's worse, as they may be trained for less time or on less data, which in some cases could be beneficial. The differences are small so may not be statistically significant. Plus the model is doing the evaluation, so while it's highly correlated with humans, a small difference like this may not mean that the the 7B model is necessarily better.
No comments yet.