(no title)
tbocek
|
10 months ago
This is probably due to this: https://github.com/ggml-org/llama.cpp/issues/12637. This GitHub issue is about interleaved sliding window attention (iSWA) not available in llama.cpp for Gemma 3. This could reduce the memory requirements a lot. They mentioned for a certain scenario, going from 62GB to 10GB.
parched99|10 months ago
Best to have two or more low-end, 16GB GPUs for a total of 32GB VRAM to run most of the better local models.
nolist_policy|10 months ago