Ok, but then your "post" isn't scientific by definition since it cannot be verified. "Post" is in quotes because I don't know what you're trying to but you're implying some sort of public discourse.
My question was "What's the value of a secret benchmark to anyone but the secret holder?"
The root of this whole discussion was a post about how Gemini 3 outperformed other models on some presumably informal question benchmark (a"vibe test"?). When asked for the benchmark, the response from the op and and someone else was that secrecy was needed to protect the benchmark from contamination. I'm skeptical of the need in the op's cases and I'm skeptical of the effectiveness of the secrecy in general. In a case where secrecy has actual value, why even discuss the benchmark publicly at all?
eru|2 months ago
grog454|2 months ago
1. What is the purpose of the benchmark?
2. What is the purpose of publicly discussing a benchmark's results but keeping the methodology secret?
To me it's in the same spirit as claiming to have defeated alpha zero but refusing to share the game.
nl|2 months ago
> A secret benchmark is: Useful for internal model selection
That's what I'm doing.
grog454|2 months ago
The root of this whole discussion was a post about how Gemini 3 outperformed other models on some presumably informal question benchmark (a"vibe test"?). When asked for the benchmark, the response from the op and and someone else was that secrecy was needed to protect the benchmark from contamination. I'm skeptical of the need in the op's cases and I'm skeptical of the effectiveness of the secrecy in general. In a case where secrecy has actual value, why even discuss the benchmark publicly at all?