(no title)
transformi | 11 months ago
In Addition it will be interesting to extend a variation of the game that the players can use tools and execute code to take their preparation one step further.
transformi | 11 months ago
In Addition it will be interesting to extend a variation of the game that the players can use tools and execute code to take their preparation one step further.
wongarsu|11 months ago
At that point, I would love to also see sub-benchmarks how each models's score is affected by being given a schema vs having it make one up, and if the model does better with state in text vs xml vs json. Those don't tell you which model is best, but they are very useful to know for actually using them.