(no title)
xdotli | 13 days ago
> opaque verifier Could you specify which tasks' verifier is not clear or defective for benchmarking purpose?
> No problems involving existing codebases, refactors, or anything of the like, Also not true and we have many tasks e.g.https://www.skillsbench.ai/tasks/fix-build-google-auto, https://www.skillsbench.ai/tasks/fix-build-agentops, https://www.skillsbench.ai/tasks/react-performance-debugging
No comments yet.