(no title)
croddin | 8 months ago
"ARC-AGI-1: * Low: 44%, $1.64/task * Medium: 57%, $3.18/task * High: 59%, $4.16/task
ARC-AGI-2: * All reasoning efforts: <5%, $4-7/task
Takeaways: * o3-pro in line with o3 performance * o3's new price sets the ARC-AGI-1 Frontier"
saberience|8 months ago
Given the models don’t even see the versions we get to see it doesn’t surprise me they have issues we these. It’s not hard to make benchmarks that are so hard that humans and Lims can’t do.
nipah|8 months ago
HDThoreaun|8 months ago