top | item 45395078 (no title) nicohayes | 5 months ago Could you clarify whether the 2B active parameter concept refers to per-token inference and how this scales with context length? Specifically how MoE affects activation during inference and any practical implications for latency. discuss order hn newest No comments yet.
No comments yet.