top | item 36876601

(no title)

I have a question: Last week I downloaded llama-7b-chat from meta's github directly (https://github.com/facebookresearch/llama) using the URL they sent via e-mail. As a result, I now have the model as consolidated.00.pth.

Your commands assume the model is a .bin file (so I guess there must be a way to convert the pytorch model .pth to the .bin file). How can I do this and what is the difference between the two models?

The facebook repo provides commands for using the models, these commands don't work on my windows machine: "NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to ...."

The facebook repo does not describe which OS you are supposed to use, so I assumed it would work on Windows too. But then if this can work why would anyone need the ggerganov llama code? I am new to all of this and easily confused, so any help is appreciated

discuss

shortrounddev2|2 years ago

To be perfectly honest, I know absolutely nothing about AI or Llama; I'm just a Windows C++ programmer so I wanted to provide cmake instructions for Windows, sorry. The .bin file is what I got from the OP's link

bed147373429|2 years ago

It's ok, I just followed your instructions and with that model is works well. But are you sure that this uses CUDA? My CPU utilization is at 50% while my GPU utilization is at 1% while the output is being generated..

pdntspa|2 years ago

llama.cpp needs the files to be in ggml format, there is a command string you can run to convert one from the other (as well as perform quantization). Or just download the GGML version

https://www.reddit.com/r/LocalLLaMA/wiki/models#wiki_llama_2...

spike_protein|2 years ago

try *cd llama.cpp && python convert-pth-to-ggml.py models/7B/ 1*