top | item 47005765

(no title)

arw0n | 16 days ago

People and benchmarks are using pretty specific, narrow tests to judge the quality of LLMs. People have biases, benchmarks get gamed. In my own experience, Gemini seems to be lazy and scatter-brained compared to Claude, but shows higher general-purpose reasoning abilities. Anthropic is also obviously massively focusing on making their models good at coding.

So it is reasonable that Claude might show significantly better coding ability for most tasks, but the better general reasoning ability proves useful in coding tasks that are complicated and obscure.

discuss

No comments yet.