2. Make sure you have the latest nvidia driver for your machine, along with the cuda toolkit. This will vary by OS but is fairly easy on most linux distros.
4. Run the model following their instructions. There are several flags that are important, but you can also just use their server example that was added a few days ago - it gives a fairly solid chat interface.
4) Run gpt4all, and wait for the obnoxiously slow startup time
... and that's it. On my machine, it works perfectly well -- about as fast as the web service version of GPT. I have a decent GPU, but I never checked if it's using it, since it's fast enough.
horsawlarway|2 years ago
Which won't run everything, but will run model in the GGML format such as https://huggingface.co/TheBloke/llama-65B-GGML
The steps are basically:
1. Download a model
2. Make sure you have the latest nvidia driver for your machine, along with the cuda toolkit. This will vary by OS but is fairly easy on most linux distros.
3. compile https://github.com/ggerganov/llama.cpp following their instructions (in particular, look for LLAMA_CUBLAS for enabling GPU support)
4. Run the model following their instructions. There are several flags that are important, but you can also just use their server example that was added a few days ago - it gives a fairly solid chat interface.
frognumber|2 years ago
1) Go to https://gpt4all.io/index.html
2) Click the downloader for your OS
3) Run the installer
4) Run gpt4all, and wait for the obnoxiously slow startup time
... and that's it. On my machine, it works perfectly well -- about as fast as the web service version of GPT. I have a decent GPU, but I never checked if it's using it, since it's fast enough.