top | item 42806546

(no title)

Hi HN, happy to see this here!

I highly recommend to take a look at the technical details of the server implementation that enables large context usage with this plugin - I think it is interesting and has some cool ideas [0].

Also, the same plugin is available for VS Code [1].

Let me know if you have any questions about the plugin - happy to explain. Btw, the performance has improved compared to what is seen in the README videos thanks to client-side caching.

[0] - https://github.com/ggerganov/llama.cpp/pull/9787

[1] - https://github.com/ggml-org/llama.vscode

discuss

amrrs|1 year ago

For those who don't know, He is the gg of `gguf`. Thank you for all your contributions! Literally the core of Ollama, LMStudio, Jan and multiple other apps!

kennethologist|1 year ago

A. Legend. Thanks for having DeepSeek available so quickly in LM Studio.

sergiotapia|1 year ago

well hot damn! killing it!

halyconWays|1 year ago

[deleted]

bangaladore|1 year ago

Quick testing on vscode to see if I'd consider replacing Copilot with this. Biggest showstopper right now for me is the output length is substantially small. The default length is set to 256, but even if I up it to 4096, I'm not getting any larger chunks of code.

Is this because of a max latency setting, or the internal prompt, or am I doing something wrong? Or is it only really make to try to autocomplete lines and not blocks like Copilot will.

Thanks :)

ggerganov|1 year ago

There are 4 stopping criteria atm:

- Generation time exceeded (configurable in the plugin config)

- Number of tokens exceeded (not the case since you increased it)

- Indentation - stops generating if the next line has shorter indent than the first line

- Small probability of the sampled token

Most likely you are hitting the last criteria. It's something that should be improved in some way, but I am not very sure how. Currently, it is using a very basic token sampling strategy with a custom threshold logic to stop generating when the token probability is too low. Likely this logic is too conservative.

eklavya|1 year ago

Thanks for sharing the vscode link. After trying I have disabled the continue.dev extension and ollama. For me this is wayyyyy faster.

jerpint|1 year ago

Thank you for all of your incredible contributions!

liuliu|1 year ago

KV cache shifting is interesting!

Just curious: how much of your code nowadays completed by LLM?

ggerganov|1 year ago

Yes, I think it is surprising that it works.

I think a fairly large amount, though can't give a good number. I have been using Github Copilot from the very early days and with the release of Qwen Coder last year have fully switched to using local completions. I don't use the chat workflow to code though, only FIM.

attentive|1 year ago

Is it correct to assume this plugin won't work with ollama?

If so, what's ollama missing?

mistercheph|1 year ago

this plugin is designed specifically for the llama.cpp server api, if you want copilot like features with ollama, you can use an ollama instance as a drop-in replacement for github copilot with this plugin: https://github.com/bernardo-bruning/ollama-copilot

There is also https://github.com/olimorris/codecompanion.nvim which doesn't have text completion, but supports a lot of other AI editor workflows that I believe are inspired by Zed and supports ollama out of the box

nancyp|1 year ago

TIL: VIM has it's own language. Thanks Georgi for LLAMA.cpp!

nacs|1 year ago

Vim is incredibly extensible.

You can use C or VIMscript but programs like Neovim support Lua as well which makes it really easy to make plugins.

halyconWays|1 year ago

Please make one for Jetbrains' IDEs!

unknown|1 year ago

[deleted]