Github is showing me unicorn - is there an Linux equivalent? I have a old Thinkpad with a puny Nvidia GPU, can I hope to find anything useful to run on that?
Building Llama.cpp from source with CUDA enabled should get you pretty far. llama-server has a really good web UI, the latest version supports model switching.
As for models, plenty of GGUF quantized (down to 2-bit) available on HF and modelscope.
car|8 days ago
As for models, plenty of GGUF quantized (down to 2-bit) available on HF and modelscope.