top | item 36976620

(no title)

weichiang | 2 years ago

say using A10G ~$1.2/hr and with full utilization on vllm 112 reqs/min => per req ~$0.00018 versus gpt-3.5 turbo $0.002 per 1k token

discuss

npsomaratna|2 years ago

Quick question: what would you estimate the running cost of Llama 2 70b to be? (On GPU, and assuming maximum utilization)?

cpill|2 years ago

yeah, that's the real question here

unknown|2 years ago

[deleted]