Keep up the awesome work. I've run across this problem myself - I somehow used $20 just testing a small demo I made with GPT-3.5.
As most ML is inherently probabilistic, it seems reasonable to make an LLM cache both semantic and _stochastic_, i.e. you wouldn't want the same answer every time you use "pick me a color" as prompt. Injecting the original LLM (GPT, Bard, etc) response as prompt for alpaca or some other model could make this cache virtually invisible.
The idea of incorporating stochastic behavior to the cache is fascinating, as it would indeed allow for more dynamic and diverse responses to certain types of queries. Combining different LLMs to achieve this could be an interesting approach to explore.
It looks like a game-changer for those working with LLM services. By caching query results, it effectively cuts down the number of requests and token count sent to the LLM service, leading to a substantial reduction in overall costs.
If you're leveraging LLMs for your projects, it's definitely worth giving GPTCache a look!
It's true that Python seems to be the go-to language for many LLM API wrapper projects. Its popularity in the AI and ML communities might be a contributing factor.
fzliu|2 years ago
As most ML is inherently probabilistic, it seems reasonable to make an LLM cache both semantic and _stochastic_, i.e. you wouldn't want the same answer every time you use "pick me a color" as prompt. Injecting the original LLM (GPT, Bard, etc) response as prompt for alpaca or some other model could make this cache virtually invisible.
cxie|2 years ago
cxie|2 years ago
If you're leveraging LLMs for your projects, it's definitely worth giving GPTCache a look!
tester457|2 years ago
cxie|2 years ago