(no title)
ash-ishh | 2 years ago
"Parameters are coefficients inside the model that are adjusted by the training procedure. The dataset is what you train the model on. Language models are trained with tokens that are subword units (e.g. prefix, root, suffix)."
His comment on GPT-4 parameters count
"Also: a model with more parameters is not necessarily better. It's generally more expensive to run and requires more RAM than a single GPU card can have. GPT-4 is rumored to be a "mixture of experts", i.e. a neural net consisting of multiple specialized modules, only one of which is run on any particular prompt. So the effective number of parameters used at any one time is smaller than the total number."
kurtoid|2 years ago