top | item 44155062 (no title) bick_nyers | 9 months ago Or merge the bottom 1/8 (or whatever) experts together and (optionally) do some minimal training with all other weights frozen. Would need to modify the MoE routers slightly to map old -> new expert indices so you don't need to retrain the routers. discuss order hn newest No comments yet.
No comments yet.