top | item 46324762

(no title)

samwho | 2 months ago

With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now.

There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way.

discuss

moebrowne|2 months ago

https://platform.openai.com/docs/guides/prompt-caching#requi...

> Caching is available for prompts containing 1024 tokens or more.

No mention of caching being in blocks of 1024 tokens thereafter.

IanCal|2 months ago

At launch it was described as being in blocks of 128

https://openai.com/index/api-prompt-caching/