top | item 43975249

(no title)

MichaelMoser123 | 9 months ago

deepseek-v2,v3,r1 are all using multi-headed attention.

discuss

order

No comments yet.