(no title)
the_duke | 24 days ago
That said ... I do think Codex 5.2 was the best coding model for more complex tasks, albeit quite slow.
So very much looking forward to trying out 5.3.
the_duke | 24 days ago
That said ... I do think Codex 5.2 was the best coding model for more complex tasks, albeit quite slow.
So very much looking forward to trying out 5.3.
NitpickLawyer|24 days ago
kilroy123|24 days ago
StephenHerlihyy|24 days ago
aurareturn|24 days ago
I use 5.2 Codex for the entire task, then ask Opus 4.5 at the end to double check the work. It's nice to have another frontier model's opinion and ask it to spot any potential issues.
Looking forward to trying 5.3.
koakuma-chan|24 days ago
fooker|24 days ago
Every new model overfits to the latest overhyped benchmark.
Someone should take this to a logical extreme and train a tiny model that scores better on a specific benchmark.
bunderbunder|24 days ago
But even an imperfect yardstick is better than no yardstick at all. You’ve just got to remember to maintain a healthy level of skepticism is all.
mrandish|24 days ago
It's not just over-fitting to leading benchmarks, there's also too many degrees of freedom in how a model is tested (harness, etc). Until there's standardized documentation enabling independent replication, it's all just benchmarketing .
scoring1774|24 days ago
mmaunder|24 days ago
int_19h|23 days ago
jahsome|24 days ago
StephenHerlihyy|24 days ago
SatvikBeri|24 days ago
malshe|24 days ago
clhodapp|24 days ago
cactusplant7374|24 days ago
wasmainiac|24 days ago
[deleted]
nerdsniper|24 days ago
nubg|24 days ago