top | item 37386193 (no title) alexedw | 2 years ago This is silly. Look at the loss and benchmark curves for the Pythia suite of models - the smaller models certainly did saturate and in fact began worsening.2T not saturating on a 7B is very different from 3T on a 1B. discuss order hn newest littlestymaar|2 years ago That's the point of the experiment actually…
littlestymaar|2 years ago