(no title)
aazo11
|
10 months ago
A better solution would train/finetune the smaller model from the responses of the larger model and only push to the inference to the edge if the smaller model is performant and the hardware specs can handle the workload?
monoid73|10 months ago