(no title)
burtonator | 1 year ago
By caching them they resume from where it left off from before thereby completely bypassing all that computation.
For large contexts this could save a ton of compute!
I think this feature and structured outputs are some of the biggest inventions in LLMs this year.
minimaxir|1 year ago
brylie|1 year ago