top | item 46995055

(no title)

jbellis | 17 days ago

M2 was one of the most benchmaxxed models we've seen. Huge gap between SWE-B results and tasks it hasn't been trained on. We'll put 2.5 on the list. https://brokk.ai/power-ranking

discuss

order

No comments yet.