This is really interesting. I think force fields in molecular dynamics have underwent a similar NN revolution. You train your NN on the output of expensive calculations to replace the expensive function with a cheap one. Could you train a small language model with a big one?
lossolo|7 months ago
Yes, it's called distillation.