(no title)
roosgit | 1 year ago
- B550MH motherboard
- Ryzen 3 4100 CPU
- 32GB (2x16) RAM cranked up to 3200MHz (prompt generation in memory bound)
- 256GB M.2 NVMe (helps with loading models faster)
- Nvidia 3060 12GB
Software-wise, I use llamafile because on the CPU it's faster by 10-20% for prompt processing than llama.cpp.
Performance "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf":
CPU-only: 23.47 t/s (processing), 8.73 t/s (generation)
GPU: 941.5 t/s (processing), 29.4 t/s (generation)
No comments yet.