(no title)
moonchrome | 2 years ago
Not to mention OpenAI has shit latency and terrible reliability - you should be using Azure models if you care about that - but pricing is also higher.
I would say fixed costs and development time is on openai side but I've seen people post great practical comparisons for latency and cost using hostes fine-tuned small models.
minimaxir|2 years ago
moonchrome|2 years ago
From what I've read 4090 should blow A100 away if you can fit within 22GB VRAM, which a 7B model should comfortably.
And the latency (along with variability and availability) on OpenAI API is terrible because of the load they are getting.
7speter|2 years ago
halflings|2 years ago
I was surprised by how fast it runs on an M2 MBP + llama.cpp; Way way faster than ChatGPT, and that's not even using the Apple neural engine.
gsuuon|2 years ago
unknown|2 years ago
[deleted]
unknown|2 years ago
[deleted]