top | item 39189060

(no title)

vsolina | 2 years ago

Thank you Meng for building Tabby and providing us with a self hosted alternative to copilot! I absolutely love it! Keep up the amazing work.

You're definitely right about the feature richness, but the truth is I just want completions :D

Performance is a funny thing, mostly scales with the slowest part of the system. Since both servers use the same inference lib (llama.cpp) which does all the heavy lifting, there's essentially no completion performance difference in the single user mode according to my tests. Because I use a smaller model by default (Q5_K_M instead of Tabby's Q8, ~30% difference in size), and LLM inference is essentially memory bandwidth bound: my new deployment is around 30% faster with no noticeable quality difference on identical hardware.

p.s. I'd highly recommend providing additional quantization methods in your model repository to make it easier for novice users.

Thank you

discuss

No comments yet.