top | item 45596090

(no title)

I vastly prefer the manual caching. There are several aspects of automatic caching that are suboptimal, with only moderately less developer burden. I don’t use Anthropic much but I wish the others had manual cache options

discuss

simonw|4 months ago

What's sub-optimal about the OpenAI approach, where you get 90% discount on tokens that you've previously sent within X minutes?

tempusalaria|4 months ago

Lots of situations, here are 2 I’ve faced recently (cannot give too much detail for privacy reasons, but should be clear enough)

1) low latency desired, long user prompt 2) function runs many parallel requests, but is not fired with common prefix very often. OpenAI was very inconsistent about properly caching the prefix for use across all requests, but with Anthropic it’s very easy to pre-fire

stavros|4 months ago

Is it wherever the tokens are, or is it the N first tokens they've seen before? Ie if my prompt is 99% the same, except for the first token, will it be cached?

thefroh|4 months ago

because you can have multiple breakpoints with Anthropic's approach, whereas with OpenAI, you only have breakpoints for what was sent.

for example if a user sends a large number of tokens, like a file, and a question, and then they change the question.