top | item 42514826 (no title) deyiao | 1 year ago The benchmark results seem unrealistically good, but I'm not sure from which angles I should challenge them. discuss order hn newest ai-christianson|1 year ago I think they're real. The model is performing better than claude-3-5-sonnet-20241022 on the claude leaderboard:https://aider.chat/docs/leaderboards/
ai-christianson|1 year ago I think they're real. The model is performing better than claude-3-5-sonnet-20241022 on the claude leaderboard:https://aider.chat/docs/leaderboards/
ai-christianson|1 year ago
https://aider.chat/docs/leaderboards/