(no title)
smeeth | 26 days ago
Not really! Sorry to harp on this, but there are two ways one model could outperform another:
1) It adheres to your strategy better
2) It improvises
If the prompt was "maximize money, here's inspiration" improvising is fine. If the prompt was "implement the strategy," improvising is failure.
Right now you have a leaderboard; you don’t yet have a benchmark, because you can’t tell whether high P&L reflects correctness.
porttipasi|26 days ago
But yeah, it's closer to a leaderboard right now.