(no title)
grog454 | 2 months ago
1. What is the purpose of the benchmark?
2. What is the purpose of publicly discussing a benchmark's results but keeping the methodology secret?
To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game.
nl|2 months ago
2. I discussed that up-thread, but https://github.com/microsoft/private-benchmarking and https://arxiv.org/abs/2403.00393 discuss some further motivation for this if you are interested.
> To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game.
This is an odd way of looking at it. There is no "winning" at benchmarks, it's simply that it is a better and more repeatable evaluation than the old "vibe test" that people did in 2024.
grog454|2 months ago
I don't understand the value of a public post discussing their results beyond maybe entertainment. We have to trust you implicitly and have no way to validate your claims.
> There is no "winning" at benchmarks, it's simply that it is a better and more repeatable evaluation than the old "vibe test" that people did in 2024.
Then you must not be working in an environment where a better benchmark yields a competitive advantage.