top | item 40078736

(no title)

namanski | 1 year ago

I just hosted both models here: https://chat.tune.app/

discuss

Thanks for the link I just tested them and they also weark in europe without the need to start a VPN. What specs are needed to run these models. I mean the llama 70B and the Wizard 8Bx22 model. On your site they run very nicely and the answears they provide are really good they booth passed my small test and I would love to run one of them locally. So far I only ran 8B models on my 16GB RAM pc using LM Studio but having such good models run locally would be awesome. I would upgrade my ram for that. My pc has an 3080 laptop GPU and I can increase the RAM to 64GB. As I understood it a 70B model needs around 64 GB but maybe only if it quantized. Can you confirm that? Can I run Llama 3 as well as you when I simply upgrade my RAM sticks. Or are you running it on a cloud and you can't say much about the requirements for windows pc users? Or do you have hardware usage data for all the models on your site and you can tell us what they need to run?

namanski|1 year ago

Hey Christoph, thanks for trying it out - we're running this on the cloud, particularly GCP, on A100s (80g).

On your query about running these models locally, I'm not sure if just upgrading your RAM would have the same throughput as what you see on the website. You can upgrade your RAM but you might get pretty bad tokens/sec.