(no title)
ghoomketu | 2 years ago
I wish they would do a behind the scenes on how much money, time, optimisation is done to make this all work.
Also big fan of anyscale. Their pricing is just phenomenal for running models like mixtral. Not sure how they are so affordable.
M4v3R|2 years ago
swyx|2 years ago
ilaksh|2 years ago
Builds very quickly with make. But if it's slow when you try it then make sure to enable any flags related to CUDA and then try the build again.
A key parameter is the one that tells it how many layers to offload to the GPU. ngl I think.
Also, download the 4 bit GGUF from HuggingFace and try that. Uses much less memory.
avereveard|2 years ago
mgreg|2 years ago
1. https://www.semianalysis.com/p/inference-race-to-the-bottom-...
TheMatten|2 years ago
ilaksh|2 years ago
ignoramous|2 years ago
Obviously still a nascent area but https://lmsys.org/blog do a good job of diving into engineering challenges behind running these LLMs.
(I'm sure there are others)
idonotknowwhy|2 years ago
unknown|2 years ago
[deleted]