(no title)
popinman322 | 11 months ago
Depending on the routing function you can figure out all the active experts ahead of the forward pass for a single token and pipeline the expert loading.
popinman322 | 11 months ago
Depending on the routing function you can figure out all the active experts ahead of the forward pass for a single token and pipeline the expert loading.
boroboro4|11 months ago