top | item 43969937 (no title) olq_plo | 9 months ago Very cool idea. Can't wait for converted models on HF. discuss order hn newest MichaelMoser123|9 months ago deepseek-v2,v3,r1 are all using multi-headed attention.
MichaelMoser123|9 months ago