top | item 46038007 (no title) GodelNumbering | 3 months ago Makes it sound like a one trick pony discuss order hn newest jascha_eng|3 months ago Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here. Mkengin|3 months ago I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/ grantpitt|3 months ago well, it's a big trick
jascha_eng|3 months ago Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here. Mkengin|3 months ago I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/
Mkengin|3 months ago I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/
jascha_eng|3 months ago
Mkengin|3 months ago
grantpitt|3 months ago