(no title)
yorwba | 15 hours ago
Batching leads to cross-contamination in practice because of things like MoE load-balancing within the batch, or supporting different batch sizes with different kernels that have different numerical behavior. But a careful implementation could avoid such issues while still benefiting from the higher efficiency of batching.
No comments yet.