top | item 40813385

(no title)

The knowledge distillation is very interesting but generating trillions of outputs from a large teacher model seems insanely expensive. Is this really more cost efficient than just using that compute instead for training your model with more data/more epochs?

discuss

DebtDeflation|1 year ago

I'm also curious. It seems like 6 months ago everyone was afraid of "model collapse" but now synthetic training generation and teacher models are all the rage. Have we solved the problem of model collapse?

astrange|1 year ago

Model collapse was basically a coping idea made up by artists who were hoping AI image generators would all magically destroy themselves at some point; I don't think it was ever considered likely to happen.

It does seem to be true that clean data works better than low quality data.

Workaccount2|1 year ago

Pay attention because it's only once you will get to watch humans learn they are nothing special in real time.

agi_is_coming|1 year ago

The distillation is done on-policy like RLHF -- the student model is generating the sequences and teacher is providing feedback in terms of logits.