reach-vb | 8 months ago | on: Smollm3: Smol, multilingual, long-context reasoner LLM
reach-vb's comments
reach-vb | 9 months ago | on: Show HN: I made a VRAM Calculator in Hugging Face
Nice! that's very cool!
reach-vb | 1 year ago | on: Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser
Brilliant job! Love how fast it is, I'm sure if the rapid pace of speech ML continues we'll have Speech to Speech models directly running in our browser!
page 1
The easiest would be to install llama.cpp from source: https://github.com/ggml-org/llama.cpp
If you want to avoid it, I added SmolLM3 to MLX-LM as well:
You can run it via `mlx_lm.chat --model "mlx-community/SmolLM3-3B-bf16"`
(requires the latest mlx-lm to be installed)
here's the MLX-lm PR if you're interested: https://github.com/ml-explore/mlx-lm/pull/272
similarly, llama.cpp here: https://github.com/ggml-org/llama.cpp/pull/14581
Let me know if you face any issues!