(no title)
Jcampuzano2 | 1 month ago
I wouldn't be surprised if the thing this is actually testing is benchmarking just claude codes constant system prompt changes.
I wouldn't really trust this to be able to benchmark opus itself.
Jcampuzano2 | 1 month ago
I wouldn't be surprised if the thing this is actually testing is benchmarking just claude codes constant system prompt changes.
I wouldn't really trust this to be able to benchmark opus itself.
No comments yet.