(no title)
roh26it | 1 year ago
So there's 2 levels of cache - the LLM request itself might be cached (simple and semantic) and the guardrail response can be cached as well.
We use a mix of a distributed kv store and a vector DB to actually store the data
No comments yet.