(no title)
a20eac1d | 2 years ago
Right now, I'm using pure CPU Llama but only the 17B version, based on I believe llama.cpp. How do I mix both CPU and GPU together for more performance?
a20eac1d | 2 years ago
Right now, I'm using pure CPU Llama but only the 17B version, based on I believe llama.cpp. How do I mix both CPU and GPU together for more performance?
brucethemoose2|2 years ago
Then offload as many layers as you can to the gpu with the gpu layers flag. You will have to play with this and observe your gpu's vram.