(no title)
Maxious | 2 days ago
There's some experiments of just removing or merging experts post training to shrink models even more https://bknyaz.github.io/blog/2026/moe/
Maxious | 2 days ago
There's some experiments of just removing or merging experts post training to shrink models even more https://bknyaz.github.io/blog/2026/moe/
vlovich123|1 day ago
Now shrinking them sure, but I’ve seen nothing that indicates you can just page weights in and out without cratering your performance like you would with a non MoE model
FuckButtons|1 day ago
bee_rider|1 day ago