top | item 40383954

(no title)

lolpanda | 1 year ago

i think llama.cpp has context caching with "--prompt-cache" but it will result in a very large cache file. i guess it's also very expensive for any inference api provider to support caching as they have to persist the file and load/unload it each time.

discuss

No comments yet.