(no title)
parched99 | 10 months ago
Prompt Tokens: 10
Time: 229.089 ms
Speed: 43.7 t/s
Generation Tokens: 41
Time: 959.412 ms
Speed: 42.7 t/s
parched99 | 10 months ago
Prompt Tokens: 10
Time: 229.089 ms
Speed: 43.7 t/s
Generation Tokens: 41
Time: 959.412 ms
Speed: 42.7 t/s
tbocek|10 months ago
parched99|10 months ago
Best to have two or more low-end, 16GB GPUs for a total of 32GB VRAM to run most of the better local models.
nolist_policy|10 months ago
idonotknowwhy|10 months ago
If you want a bit more context, try -ctv q8 -ctk q8 (from memory so look it up) to quant the kv cache.
Also an imatrix gguf like iq4xs might be smaller with better quality
parched99|10 months ago
floridianfisher|10 months ago
parched99|10 months ago