(no title)
Mkengin | 6 months ago
$env:LLAMA_SET_ROWS = "1"; ./llama-server -c 140000 -m D:\ik_llama.cpp\build\bin\Release\models\Qwen3-Coder-30B-A3B-Instruct-IQ4_KSS.gguf -ngl 999 --flash-attn -ctk q8_0 -ctv q8_0 -ot "blk\.(19|2[0-9]|3[0-9]|4[0-7])\.ffn_.*_exps\.=CPU" --temp 0.7 --top-p 0.8 --top-k 20 --repeat_penalty 1.05 --threads 8
In my case I offload layers 19-47, maybe you would just have to offload 37-47, so "blk\.(3[7-9]|4[0-7])\.ffn_.*_exps\.=CPU"
magicalhippo|6 months ago