top | item 46922139

(no title)

Does anyone know more about the benchmark? 60% accuracy gets a drumroll? How would Claude do? How would a human do? I tried the previous version and was not impressed. I went back to Claude that is very hard to beat, and versatile with context enrichment.

discuss

No comments yet.