top | item 44986395

(no title)

whistle650 | 6 months ago

It seems they use 70% of the benchmark query-answer pairs to cluster and determine which models work best for each cluster (by sending all queries to all models and looking at responses vs ground truth answers). Then they route the remaining 30% "test" set queries according to those prior determinations. It doesn't seem surprising that this approach would give you Pareto efficiency on those benchmarks.

discuss

visarga|6 months ago

It's ok if you can update the router over time, the more data you have the better.