(no title)
claudiawerner | 2 years ago
What I'd really like to see improve is having larger contexts. The current context size available for llama.cpp-compatible models is 2048 tokens, which given the scenario and character descriptions at the start of the prompt only gives the LLM about 3 paragraphs of memory. It just forgets anything you said more than 3 paragraphs ago, which makes for a pretty miserable longer term RP experience unless you constantly update the summary of what's happened in the story/roleplay so far to be sent with every prompt.
4096 or larger contexts running efficiently (>= 1 T/s) with 12GB/24GB consumer grade GPU layer offloading would be fantastic, and bonus points if we can get 30 or 40B models working with that.
nar001|2 years ago