(no title)
grog454 | 2 months ago
I don't understand the value of a public post discussing their results beyond maybe entertainment. We have to trust you implicitly and have no way to validate your claims.
> There is no "winning" at benchmarks, it's simply that it is a better and more repeatable evaluation than the old "vibe test" that people did in 2024.
Then you must not be working in an environment where a better benchmark yields a competitive advantage.
eru|2 months ago
In principle, we have ways: if nl's reports consistently predict how public benchmarks will turn out later, they can build up a reputation. Of course, that requires that we follow nl around for a while.