I'll say this explicitly: these llamafile things are stupid.
You should not download arbitrary user uploaded binary executables and run them on your local laptop.
Hugging face may do it's best to prevent people from taking advantage of this (heck, they literally invented safetensors), but long story short: we can't have nice things because people suck.
If you start downloading random executables from the internet and running them, you will regret it.
Just spend the extra 5 minutes to build llama.cpp yourself. It's very, very easy to do and many guides already exist for doing exactly that.
It only takes away choice if you use the demo files with the models baked in. There are versions of this under the Releases->Assets that are only the actual llama.cpp OS portable binaries that you pass the model file path to as normal.
Compiling llama.cpp is relatively easy. Compiling llama.cpp for GPU support is a bit harder. I think it's nice this OS portable binaries of llama.cpp applications like main, server, and llava exist. Too bad there's no opencl ones. The only problem was baking in the models. Downloading applications off the internet is not that weird. After all, it's the recommended way to install Rust, etc.
I'd like to train one of the provided LLM's with my one data, I heard that RAG can be used for that. Does anyone have any pointers on how this could be achieved with llamafiles all locally on my server?
The llava multi-modal models are fun. I find requesting json formatted output lets you overcome the limited response length baked in. https://huggingface.co/mys/ggml_bakllava-1 (a CLIP+Mistral-7B instead of CLIP+llama2-7B) is my favorite.
wokwokwok|2 years ago
It's unsafe and it takes all the choice and control away from you.
You should, instead:
1) Build a local copy of llama.cpp (literally clone https://github.com/ggerganov/llama.cpp and run 'make').
2) Download the model version you actually want from hugging face (for example, from https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGU..., with the clearly indicated required RAM for each variant)
3) Run the model yourself.
I'll say this explicitly: these llamafile things are stupid.
You should not download arbitrary user uploaded binary executables and run them on your local laptop.
Hugging face may do it's best to prevent people from taking advantage of this (heck, they literally invented safetensors), but long story short: we can't have nice things because people suck.
If you start downloading random executables from the internet and running them, you will regret it.
Just spend the extra 5 minutes to build llama.cpp yourself. It's very, very easy to do and many guides already exist for doing exactly that.
superkuh|2 years ago
Compiling llama.cpp is relatively easy. Compiling llama.cpp for GPU support is a bit harder. I think it's nice this OS portable binaries of llama.cpp applications like main, server, and llava exist. Too bad there's no opencl ones. The only problem was baking in the models. Downloading applications off the internet is not that weird. After all, it's the recommended way to install Rust, etc.
unknown|2 years ago
[deleted]
senthil_rajasek|2 years ago
Llamafile is the new best way to run a LLM on your own computer (simonwillison.net)
https://news.ycombinator.com/item?id=38489533
And
https://news.ycombinator.com/item?id=38464057
dang|2 years ago
Llamafile is the new best way to run a LLM on your own computer - https://news.ycombinator.com/item?id=38489533 - Dec 2023 (45 comments)
Llamafile lets you distribute and run LLMs with a single file - https://news.ycombinator.com/item?id=38464057 - Nov 2023 (286 comments)
Akashic101|2 years ago
paolop|2 years ago
superkuh|2 years ago
The llava multi-modal models are fun. I find requesting json formatted output lets you overcome the limited response length baked in. https://huggingface.co/mys/ggml_bakllava-1 (a CLIP+Mistral-7B instead of CLIP+llama2-7B) is my favorite.
bugglebeetle|2 years ago
aldarisbm|2 years ago
gapchuboy|2 years ago