top | item 46299768

(no title)

guiriduro | 2 months ago

Apriel-H1-15b-Thinker-SFT uses incremental distillation from Apriel-Nemotron-15B-Thinker, selectively replacing less critical attention layers with linear Mamba blocks to reduce computational complexity while preserving reasoning quality.

discuss

order

No comments yet.