top | item 37811641

(no title)

ash-ishh | 2 years ago

Checkout this tweet https://twitter.com/ylecun/status/1706545305762582580 by Yan LeCun.

"Parameters are coefficients inside the model that are adjusted by the training procedure. The dataset is what you train the model on. Language models are trained with tokens that are subword units (e.g. prefix, root, suffix)."

His comment on GPT-4 parameters count

"Also: a model with more parameters is not necessarily better. It's generally more expensive to run and requires more RAM than a single GPU card can have. GPT-4 is rumored to be a "mixture of experts", i.e. a neural net consisting of multiple specialized modules, only one of which is run on any particular prompt. So the effective number of parameters used at any one time is smaller than the total number."

discuss

kurtoid|2 years ago

I thought it was one expert runs per token?