I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs
embedding-shape|3 months ago
Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.
refulgentis|3 months ago