top | item 46811987

(no title)

Jcampuzano2 | 1 month ago

Claude Code. They mention they are using claude codes CLI in the benchmark, and claude code changes constantly.

I wouldn't be surprised if the thing this is actually testing is benchmarking just claude codes constant system prompt changes.

I wouldn't really trust this to be able to benchmark opus itself.

discuss

order

No comments yet.