top | item 40977103

Codestral Mamba

485 points| tosh | 1 year ago |mistral.ai

138 comments

order
[+] bhouston|1 year ago|reply
What are the steps required to get this running in VS Code?

If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.

(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)

[+] leourbina|1 year ago|reply
If you can run this using ollama, then you should be able to use https://www.continue.dev/ with both IntelliJ and VSCode. Haven’t tried this model yet - but overall this plugin works well.
[+] refulgentis|1 year ago|reply
"All you need is users" doesn't seem optimal IMHO, Stability.ai providing an object lesson in that.

They just released weights, and being a for profit, need to optimize for making money, not eyeballs. It seems wise to guide people to the API offering.

[+] DalasNoin|1 year ago|reply
I feel like local models could be an amazing coding experience because you could disconnect from the internet. Usually I need to open chatgpt or google every so often to solve some issue or generate some function, but this also introduces so many distractions. imagine being able to turn off internet completely and only have a chat assistant that runs locally. I fear though that it is just going to be a bit to slow at generating tokens on CPU to not be annoying.
[+] sleepytimetea|1 year ago|reply
Looking through the Quickstart docs, they have an API that can generate code. However, I don't think they have a way to do "Day 2" code editing.

Also, doesn't seem to have a freemium tier...need to start paying even before trying it out ?

"Our API is currently available through La Plateforme. You need to activate payments on your account to enable your API keys."

[+] yogeshp|1 year ago|reply
Website codegpt.co also has a plugin for both VS Code and Intellij. When model becomes available in Ollama, you can connect plugin in VS code to local ollama instance.
[+] antifa|1 year ago|reply
Maybe not this model, but checkout TabbyML for offline/selfhostws LLMs in vscode.
[+] solarkraft|1 year ago|reply
I kinda just want something that can keep up with the original version of Copilot. It was so much better than the crap they’re pumping out now (keeps messing up syntax and only completing a few characters at a time).
[+] razodactyl|1 year ago|reply
Supposedly they were training on feedback provided by the plugin itself but that approach doesn't make sense to me because:

- I don't remember the shortcuts most of the time.

- When I run completions I double take and realise they're wrong.

- I am not a good source of data.

All this information is being fed back into the model as positive feedback. So perhaps reason for it to have gone downhill.

I recall it being amazing at coding back in the day, now I can't trust it.

Of course, it's anecdotal which is also problematic in itself but I have definitely noticed the issue where it will fail and stop autocompleting or provide completely irrelevant code.

[+] heeton|1 year ago|reply
Have you tried supermaven? It replaced copilot for me a couple of months ago.
[+] thot_experiment|1 year ago|reply
Does anyone have a favorite FIM capable model? I've been using codellama-13b through ollama w/ a vim extension i wrote and it's okay but not amazing, I definitely get better code most of the time out of Gemma-27b but no FIM (and for some reason codellama-34b has broken inference for me)
[+] trissi|1 year ago|reply
I use deepseek-coder-7b-instruct-v1.5 & DeepSeek-Coder-V2-Lite-Instruct when I want speed & codestral-22B-v0.1 when I want smartness.

All of those are FIM capable, but especially deepseek-v2-lite is very picky with its prompt template so make sure you use it correctly...

Depending on your hardware codestral-22B might be fast enough for everything, but for me it's a bit to slow...

If you can run it deepseek v2 non-light is amazing, but it requires loads of VRAM

[+] xoranth|1 year ago|reply
Is the extension you wrote public?
[+] sa-code|1 year ago|reply
It's great to see a high-profile model using Mamba2!
[+] imjonse|1 year ago|reply
The MBPP column should bold DeepSeek as it has a better score than Codestral.
[+] smith7018|1 year ago|reply
Which means Codestral Mamba and DeepSeek both lead four benchmarks. Kinda takes the air out the announcement a bit.
[+] attentive|1 year ago|reply
codegeex4-all-9b beats them "on paper" so that's why it's not in the benchmarks.
[+] flakiness|1 year ago|reply
So Mamba is supposed to be faster and the article claims that. But they don't have any latency numbers.

Has anyone tried this? And then, is it fast(er)?

[+] monkeydust|1 year ago|reply
Any recommended product primers to Mamba vs Transformers - pros/cons etc?
[+] modeless|1 year ago|reply
> Unlike Transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length

> We have tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens

Why only 256k tokens? Gemini's context window is 1 million or more and it's (probably) not even using Mamba.

[+] rileyphone|1 year ago|reply
Gemini is probably using ring attention. But scaling to that size requires more engineering effort in terms of interlink that goes beyond the purpose of this release from Mistral.
[+] tatsuya4|1 year ago|reply
Just did a quick test in the https://model.box playground, and it looks like the completion length is noticeably shorter than other models (e.g., gpt-4o). However, the response speed meets expectations..
[+] culopatin|1 year ago|reply
Does anyone have a video or written article that would get one up to speed with a bit of the history/progression and current products that are out there for one to try locally?

This is coming from someone that understands the general concepts of how LLMs work but only used the general publicly available tools like ChatGPT, Claude, etc.

I want to see if I have any hardware I can stress and run something locally, but don’t know where to start or even what are the available options.

[+] rjurney|1 year ago|reply
But I JUST switched from GPT4o to Claude! :( Kidding, but it isn't clear how to use this thing, as others have pointed out.
[+] ukuina|1 year ago|reply
What made you switch?
[+] zamalek|1 year ago|reply
Is this the active Codestral model on Le Chat? I got quite some mixed results from it tonight.
[+] localfirst|1 year ago|reply
any sort of evals on how it compares to closed models like chat gpt 4 or open ones like WizardLLM ?
[+] taf2|1 year ago|reply
How does this work in vim?
[+] kristianp|1 year ago|reply
Similarly, is there a way to use it with Kate or Sublime Text?
[+] pzo|1 year ago|reply
weird they compare to deepseek-coder v1.5 when we already have v2.0. Any advantage to use codestral mamba apart from that it's lighter in weights?
[+] croemer|1 year ago|reply
The first sentence is wrong. The website says:

> As a tribute to Cleopatra, whose glorious destiny ended in tragic snake circumstances

but according to Wikipedia this is not true:

> When Cleopatra learned that Octavian planned to bring her to his Roman triumphal procession, she killed herself by poisoning, contrary to the popular belief that she was bitten by an asp.

[+] skybrian|1 year ago|reply
Yes, that seems to be a myth, but exact circumstances seem rather uncertain according to the Wikipedia article [1]:

> [A]ccording to the Roman-era writers Strabo, Plutarch, and Cassius Dio, Cleopatra poisoned herself using either a toxic ointment or by introducing the poison with a sharp implement such as a hairpin. Modern scholars debate the validity of ancient reports involving snakebites as the cause of death and whether she was murdered. Some academics hypothesize that her Roman political rival Octavian forced her to kill herself in a manner of her choosing. The location of Cleopatra's tomb is unknown. It was recorded that Octavian allowed for her and her husband, the Roman politician and general Mark Antony, who stabbed himself with a sword, to be buried together properly.

I think this rounds to “nobody really knows.”

The “glorious destiny” seems kind of shaky, too. It’s just a throwaway line anyway.

[1] https://en.m.wikipedia.org/wiki/Death_of_Cleopatra

[+] ljsprague|1 year ago|reply
What bothers me more is that the legend is that she was killed by an asp, not a mamba.
[+] rjurney|1 year ago|reply
I believe this is in dispute among sources.