Ship_Star_1010 | 1 year ago | on: Run Llama locally with only PyTorch on CPU
PyTorch has a native llm solution
It supports all the LLama models. It supports CPU, MPS and CUDA
https://github.com/pytorch/torchchat
Getting 4.5 tokens a second using 3.1 8B full precision using CPU only on my M1