top | item 46236953

(no title)

yberreby | 2 months ago

Based on what works elsewhere in deep learning, I see no reason why you couldn't train once with a randomized number of experts, then set that number during inference based on your desired compute-accuracy tradeoff. I would expect that this has been done in the literature already.

discuss

No comments yet.