(no title)
wizee | 6 months ago
At small contexts with llama.cpp on my M4 Max, I get 90+ tokens/sec generation and 800+ tokens/sec prompt processing. Even at large contexts like 50k tokens, I still get fairly usable speeds (22 tok/s generation).
wizee | 6 months ago
At small contexts with llama.cpp on my M4 Max, I get 90+ tokens/sec generation and 800+ tokens/sec prompt processing. Even at large contexts like 50k tokens, I still get fairly usable speeds (22 tok/s generation).
No comments yet.