(no title)
badFEengineer | 2 years ago
* Looks like for gpt-4 turbo (https://artificialanalysis.ai/models/gpt-4-turbo-1106-previe...), there was a huge latency spike on December 28, which is causing the avg. latency to be very high. Perhaps dropping top and bottom 10% of requests will help with avg (or switch over to median + include variance)
* Adding latency variance would be truly awesome, I've run into issues with some LLM API providers where they've had incredibly high variance, but I haven't seen concrete data across providers
Gcam|2 years ago
AaronFriel|2 years ago
Would be interesting to see request latency and throughput when API calls occur cold (first data point), and once per hour, minute, and per second with the first N samples dropped.
Also, at least with Azure OpenAI, the AI safety features (filtering & annotations) make a significant difference in time to first token.