top | item 40721762

(no title)

arguably you can reduce even more latency by keeping the model on-device as well, but that would mean revealing the weights of the fine-tuned model.

If the user preferred reduced latency and had the RAM, is that an option?

discuss

daemonologist|1 year ago

This is true, but only if you have a GPU (/accelerator) comparable in performance to the one backing the service, or at least comparable after accounting for the local benefit. This is an expensive proposition because it will be sitting idle between completions and when you're not coding.

sqs|1 year ago

Not for this fine-tuned model yet, but Cody supports local models: https://sourcegraph.com/docs/cody/clients/install-vscode#sup....

I just used Cody with Ollama for local inference on a flight where the wifi was broken, and it never fails to blow my mind: https://x.com/sqs/status/1803269013310759236.

rdedev|1 year ago

Looking at their GitHub page or seems like they are using existing LLM services. It should be possible to modify cody to make it work with a local llm

ado__dev|1 year ago

You don’t have to modify anything. We support local LLMs for chat and completions with Ollama.

https://sourcegraph.com/blog/local-code-completion-with-olla...

s1mplicissimus|1 year ago

the model is probably most of the "secret sauce" of cody, so if they gave that away people could just copy it around like mp3s. my guess

morgante|1 year ago

Completely incorrect, as Sourcegraph has not historically trained models and Cody swaps between many open source and 3rd party models.