top | item 36976620 (no title) weichiang | 2 years ago say using A10G ~$1.2/hr and with full utilization on vllm 112 reqs/min => per req ~$0.00018 versus gpt-3.5 turbo $0.002 per 1k token discuss order hn newest npsomaratna|2 years ago Quick question: what would you estimate the running cost of Llama 2 70b to be? (On GPU, and assuming maximum utilization)? cpill|2 years ago yeah, that's the real question here unknown|2 years ago [deleted]
npsomaratna|2 years ago Quick question: what would you estimate the running cost of Llama 2 70b to be? (On GPU, and assuming maximum utilization)? cpill|2 years ago yeah, that's the real question here
npsomaratna|2 years ago
cpill|2 years ago
unknown|2 years ago
[deleted]