Yeah at $WORK we use various LLM APIs to analyze text; it's not heavy usage in terms of tokens but maybe 10K calls per day. We've found that response times vary a lot, sometimes going over a minute for simple tasks, and random fails happen. Retry logic is definitely mandatory, and it's good to have multiple providers ready. We're abstracting calls across three different APIs (openai, gemini and mistral, btw we're getting pretty good results with mistral!) so we can switch workloads quickly if needed.
jwillp|2 months ago
duckmysick|2 months ago
phantasmish|2 months ago
aamoscodes|2 months ago