top | item 46305760

(no title)

gpjt | 2 months ago

Awesome, thanks! I'm still doing trains on the big machines right now (hopefully will write up over xmas) but I think once I've worked out the sweet spot for memgatokens per dollar for this model, it's time to start tweaking the other controls -- LR and cosine variation of it, as you said, and also dropout, bias, weight tying, and definitely gradient clipping (which should at least get better bang for the buck from time/$ spent). I'll leave it to Google to follow up Chinchilla with a "best batch size across a thousand trained models" paper ;-)

discuss

order

No comments yet.