top | item 45395078

(no title)

nicohayes | 5 months ago

Could you clarify whether the 2B active parameter concept refers to per-token inference and how this scales with context length? Specifically how MoE affects activation during inference and any practical implications for latency.

discuss

No comments yet.