top | item 44239614

(no title)

visiondude | 8 months ago

always seemed to me that efficient caching strategies could greatly reduce costs… wonder if they cooked up something new

discuss

order

xmprt|8 months ago

How are LLMs cached? Every prompt would be different so it's not clear how that would work. Unless you're talking about caching the model weights...

hadlock|8 months ago

I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week.

amanda99|8 months ago

You would use a KV cache to cache a significant chunk of the inference work.

koakuma-chan|8 months ago

A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc.

tasuki|8 months ago

> Every prompt would be different

No? Eg "how to cook pasta" is probably asked a lot.