top | item 47104824 (no title) car | 8 days ago Building Llama.cpp from source with CUDA enabled should get you pretty far. llama-server has a really good web UI, the latest version supports model switching.As for models, plenty of GGUF quantized (down to 2-bit) available on HF and modelscope. discuss order hn newest No comments yet.
No comments yet.