Deploying LLaMA 3 8B is fairly easy but LLaMA 3 70B is another beast. Given the amount of VRAM needed you might want to provision more than one GPU and use a dedicated inference server like vLLM in order to split your model on several GPUs.
LLaMA 3 8B requires around 16GB of disk space and 20GB of VRAM (GPU memory) in FP16. As for LLaMA 3 70B, it requires around 140GB of disk space and 160GB of VRAM in FP16.
I hope it is useful, and if you have questions please don't hesitate to ask!
juliensalinas|1 year ago
Deploying LLaMA 3 8B is fairly easy but LLaMA 3 70B is another beast. Given the amount of VRAM needed you might want to provision more than one GPU and use a dedicated inference server like vLLM in order to split your model on several GPUs.
LLaMA 3 8B requires around 16GB of disk space and 20GB of VRAM (GPU memory) in FP16. As for LLaMA 3 70B, it requires around 140GB of disk space and 160GB of VRAM in FP16.
I hope it is useful, and if you have questions please don't hesitate to ask!
Julien