(no title)
idontknowmuch | 1 year ago
And I'd be curious of the utility of model that scales up and down at inference - if this was the case you'd still need to have storage that is the same as the maximum model size. This would essentially be useless for embedded applications, etc., unless you have heavy quantization - but quantization in a small parameter space would probably make the smaller modes useless. I could see the benefit here in terms of optimizing latency for different applications but maybe you have other ideas.
Given all that, I think training for smaller number of parameters, as noted in OP, would kind of beat out some model that scales at inference time - especially when most people know what kind of application they are aiming to build and the required level of performance.
No comments yet.