top | item 41285038

(no title)

Autoregressive models can't just resume so they have to re-parse the entire prompt again for each execution.

By caching them they resume from where it left off from before thereby completely bypassing all that computation.

For large contexts this could save a ton of compute!

I think this feature and structured outputs are some of the biggest inventions in LLMs this year.

discuss

Prompt caching has been a thing for LLMs since GPT-2 (e.g. transformers's `use_past=True`), it's more of a surprise that it took this long for the main LLM providers to provide a good implementation.

brylie|1 year ago

I’m building an app with OpenAI, using structured outputs. Does OpenAI also support prompt caching?