top | item 41196195

(no title)

solaxun | 1 year ago

On the latency of the first request - How is the CFG cached?

Is it done at the API Key + schema level? Meaning that for a given API key, the latency penalty for a new schema is only paid one time, regardless of how far apart requests are? Or is cached with less duration, e.g. each session, conversation thread, etc?

discuss

order

No comments yet.