top | item 46922139

(no title)

zen4ttitude | 24 days ago

Does anyone know more about the benchmark? 60% accuracy gets a drumroll? How would Claude do? How would a human do? I tried the previous version and was not impressed. I went back to Claude that is very hard to beat, and versatile with context enrichment.

discuss

order

No comments yet.