In ollama, how do you set up the larger context, and figure out what settings to use? I've yet to find a good guide. I'm also not quite sure how I should figure out what those settings should be for each model.
There's context length, but then, how does that relate to input length and output length? Should I just make the numbers match? 32k is 32k? Any pointers?
Ollama breaks for me. If I manually set the context higher. The next api call from clone resets it back.
And ollama keeps taking it out of memory every 4 minutes.
LM studio with MLX on Mac is performing perfectly and I can keep it in my ram indefinitely.
Ollama keep alive is broken as a new rest api call resets it after. I’m surprised it’s this glitched with longer running calls and custom context length.
ericb|9 months ago
There's context length, but then, how does that relate to input length and output length? Should I just make the numbers match? 32k is 32k? Any pointers?
lis|9 months ago
Just for ollama, see: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-c...
I’m using llama.cpp though, so I can’t confirm these methods.
zackify|9 months ago
And ollama keeps taking it out of memory every 4 minutes.
LM studio with MLX on Mac is performing perfectly and I can keep it in my ram indefinitely.
Ollama keep alive is broken as a new rest api call resets it after. I’m surprised it’s this glitched with longer running calls and custom context length.