top | item 47023462

(no title)

johndough | 15 days ago

Only if chip-to-chip communication was a bottleneck. Which it isn't.

If a layer completely fits in SRAM (as is probably the case for Cerebras), you only have to communicate the hidden states between chips for each token. The hidden states are very small (7168 floats for DeepSeek-V3.2 https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/c... ), which won't be a bottleneck.

Things get more complicated if a layer does not fit in SRAM, but it still works out fine in the end.

discuss

No comments yet.