(no title)
jessfyi | 1 year ago
In the nicest way possible I'm saying this form of preference testing is ultimately useless, primarily due to a base of dilettantes with more free time than knowledge parading around as subject matter experts and secondarily due to presumed malfeasance. The latter is more apparent to more of the masses (that don't blindly believe any leaderboard they see) now that access to the model itself is more widespread and people are seeing the performance doesn't match the "revolution" promised [0]. If you're still confused why selecting a model based on a glorified Hot or Not application is flawed, perhaps ask yourself why other evals exist in the first place (hint: some tests are harder than others.)
[0](One such instance of someone competent testing it and realizing it's not even close to the "best" model out) https://www.youtube.com/watch?v=WVpaBTqm-Zo
Breza|1 year ago
sigmoid10|1 year ago