top | item 35471071

GPTCache: Slash Your LLM API Costs by 10x

18 points| fhaltmayer | 2 years ago |github.com

6 comments

fzliu|2 years ago

Keep up the awesome work. I've run across this problem myself - I somehow used $20 just testing a small demo I made with GPT-3.5.

As most ML is inherently probabilistic, it seems reasonable to make an LLM cache both semantic and _stochastic_, i.e. you wouldn't want the same answer every time you use "pick me a color" as prompt. Injecting the original LLM (GPT, Bard, etc) response as prompt for alpaca or some other model could make this cache virtually invisible.

cxie|2 years ago

The idea of incorporating stochastic behavior to the cache is fascinating, as it would indeed allow for more dynamic and diverse responses to certain types of queries. Combining different LLMs to achieve this could be an interesting approach to explore.

cxie|2 years ago

It looks like a game-changer for those working with LLM services. By caching query results, it effectively cuts down the number of requests and token count sent to the LLM service, leading to a substantial reduction in overall costs.

If you're leveraging LLMs for your projects, it's definitely worth giving GPTCache a look!

tester457|2 years ago

Between langchain and this it looks like every new LLM API wrapper startup is going to use python.

cxie|2 years ago

It's true that Python seems to be the go-to language for many LLM API wrapper projects. Its popularity in the AI and ML communities might be a contributing factor.