top | item 44239614 (no title) visiondude | 8 months ago always seemed to me that efficient caching strategies could greatly reduce costs… wonder if they cooked up something new discuss order hn newest xmprt|8 months ago How are LLMs cached? Every prompt would be different so it's not clear how that would work. Unless you're talking about caching the model weights... hadlock|8 months ago I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week. load replies (1) HugoDias|8 months ago This document explains the process very well. It’s a good read: https://platform.openai.com/docs/guides/prompt-caching load replies (2) amanda99|8 months ago You would use a KV cache to cache a significant chunk of the inference work. load replies (2) koakuma-chan|8 months ago A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc. tasuki|8 months ago > Every prompt would be differentNo? Eg "how to cook pasta" is probably asked a lot.
xmprt|8 months ago How are LLMs cached? Every prompt would be different so it's not clear how that would work. Unless you're talking about caching the model weights... hadlock|8 months ago I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week. load replies (1) HugoDias|8 months ago This document explains the process very well. It’s a good read: https://platform.openai.com/docs/guides/prompt-caching load replies (2) amanda99|8 months ago You would use a KV cache to cache a significant chunk of the inference work. load replies (2) koakuma-chan|8 months ago A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc. tasuki|8 months ago > Every prompt would be differentNo? Eg "how to cook pasta" is probably asked a lot.
hadlock|8 months ago I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week. load replies (1)
HugoDias|8 months ago This document explains the process very well. It’s a good read: https://platform.openai.com/docs/guides/prompt-caching load replies (2)
amanda99|8 months ago You would use a KV cache to cache a significant chunk of the inference work. load replies (2)
koakuma-chan|8 months ago A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc.
tasuki|8 months ago > Every prompt would be differentNo? Eg "how to cook pasta" is probably asked a lot.
xmprt|8 months ago
hadlock|8 months ago
HugoDias|8 months ago
amanda99|8 months ago
koakuma-chan|8 months ago
tasuki|8 months ago
No? Eg "how to cook pasta" is probably asked a lot.