(no title)
zaptrem | 4 months ago
Afaik embedding and norm params are excluded from weight decay as standard practice. Is this no longer true?
E.g., they exclude them in minGPT: https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab...
zaptrem | 4 months ago
Afaik embedding and norm params are excluded from weight decay as standard practice. Is this no longer true?
E.g., they exclude them in minGPT: https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab...
levocardia|4 months ago
3abiton|4 months ago