top | item 37386193

(no title)

alexedw | 2 years ago

This is silly. Look at the loss and benchmark curves for the Pythia suite of models - the smaller models certainly did saturate and in fact began worsening.

2T not saturating on a 7B is very different from 3T on a 1B.

discuss

order

littlestymaar|2 years ago

That's the point of the experiment actually…