top | item 46812297 (no title) eterm | 1 month ago 4. The graph starts January 8.Why January 8? Was that an outlier high point?IIRC, Opus 4.5 was released late november. discuss order hn newest F7F7F7|1 month ago Right after the Holiday double token promotion users felt (perceived) a huge regression in capabilities. I bet that triggered the idea. pertymcpert|1 month ago People were away for the holidays. What do you want them to do? littlestymaar|1 month ago Or maybe, juste maybe, that's when they started testing… eterm|1 month ago Wayback machine has nothing for this site before today, and article is "last updated Jan 29".A benchmark like this ought to start fresh from when it is published.I don't entirely doubt the degradation, but the choice of where they went back to feels a bit cherry-picked to demonstrate the value of the benchmark. load replies (1)
F7F7F7|1 month ago Right after the Holiday double token promotion users felt (perceived) a huge regression in capabilities. I bet that triggered the idea.
littlestymaar|1 month ago Or maybe, juste maybe, that's when they started testing… eterm|1 month ago Wayback machine has nothing for this site before today, and article is "last updated Jan 29".A benchmark like this ought to start fresh from when it is published.I don't entirely doubt the degradation, but the choice of where they went back to feels a bit cherry-picked to demonstrate the value of the benchmark. load replies (1)
eterm|1 month ago Wayback machine has nothing for this site before today, and article is "last updated Jan 29".A benchmark like this ought to start fresh from when it is published.I don't entirely doubt the degradation, but the choice of where they went back to feels a bit cherry-picked to demonstrate the value of the benchmark. load replies (1)
F7F7F7|1 month ago
pertymcpert|1 month ago
littlestymaar|1 month ago
eterm|1 month ago
A benchmark like this ought to start fresh from when it is published.
I don't entirely doubt the degradation, but the choice of where they went back to feels a bit cherry-picked to demonstrate the value of the benchmark.