(no title)
DaKevK
|
13 days ago
Genuinely one of the more interesting model evals I've seen described. The sunk cost framing makes sense -- 4.5 doubles down, 4.6 cuts losses faster. 9 days vs 59 is a wild result. Makes me wonder how much of the regression complaints are from people hitting 4.6 on tasks where the first approach was obviously correct.
MrCheeze|13 days ago
https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQDvsy5D...