top | item 43969937

(no title)

olq_plo | 9 months ago

Very cool idea. Can't wait for converted models on HF.

discuss

order

MichaelMoser123|9 months ago

deepseek-v2,v3,r1 are all using multi-headed attention.