top | item 43753891

(no title)

aazo11 | 10 months ago

I spent a couple of weeks trying out local inference solutions for a project. Wrote up my thoughts with some performance benchmarks in a blog.

TLDR -- What these frameworks can do on off the shelf laptops is astounding. However, it is very difficult to find and deploy a task specific model and the models themselves (even with quantization) are so large the download would kill UX for most applications.

discuss

codelion|10 months ago

There are ways to improve the performance of local LLMs with inference time techniques. You can try with optillm - https://github.com/codelion/optillm it is possible to match the performance of larger models on narrow tasks by doing more at inference.