Codestral Mamba | WingNews

[+] bhouston|1 year ago|reply

What are the steps required to get this running in VS Code?

If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.

(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)

[+] leourbina|1 year ago|reply

If you can run this using ollama, then you should be able to use https://www.continue.dev/ with both IntelliJ and VSCode. Haven’t tried this model yet - but overall this plugin works well.

[+] refulgentis|1 year ago|reply

"All you need is users" doesn't seem optimal IMHO, Stability.ai providing an object lesson in that.

They just released weights, and being a for profit, need to optimize for making money, not eyeballs. It seems wise to guide people to the API offering.

[+] DalasNoin|1 year ago|reply

I feel like local models could be an amazing coding experience because you could disconnect from the internet. Usually I need to open chatgpt or google every so often to solve some issue or generate some function, but this also introduces so many distractions. imagine being able to turn off internet completely and only have a chat assistant that runs locally. I fear though that it is just going to be a bit to slow at generating tokens on CPU to not be annoying.

[+] sleepytimetea|1 year ago|reply

Looking through the Quickstart docs, they have an API that can generate code. However, I don't think they have a way to do "Day 2" code editing.

Also, doesn't seem to have a freemium tier...need to start paying even before trying it out ?

"Our API is currently available through La Plateforme. You need to activate payments on your account to enable your API keys."

[+] PufPufPuf|1 year ago|reply

Currently the best (most user-friendly) way to run models locally is to use Ollama with Continue.dev. This one is not available yet, though: https://github.com/ggerganov/llama.cpp/issues/8519

[+] yogeshp|1 year ago|reply

Website codegpt.co also has a plugin for both VS Code and Intellij. When model becomes available in Ollama, you can connect plugin in VS code to local ollama instance.

[+] antifa|1 year ago|reply

Maybe not this model, but checkout TabbyML for offline/selfhostws LLMs in vscode.

[+] solarkraft|1 year ago|reply

I kinda just want something that can keep up with the original version of Copilot. It was so much better than the crap they’re pumping out now (keeps messing up syntax and only completing a few characters at a time).

[+] terhechte|1 year ago|reply

Have you tried supermaven? (https://supermaven.com). I find it much better than copilot. Using it daily.

[+] razodactyl|1 year ago|reply

Supposedly they were training on feedback provided by the plugin itself but that approach doesn't make sense to me because:

- I don't remember the shortcuts most of the time.

- When I run completions I double take and realise they're wrong.

- I am not a good source of data.

All this information is being fed back into the model as positive feedback. So perhaps reason for it to have gone downhill.

I recall it being amazing at coding back in the day, now I can't trust it.

Of course, it's anecdotal which is also problematic in itself but I have definitely noticed the issue where it will fail and stop autocompleting or provide completely irrelevant code.

[+] heeton|1 year ago|reply

Have you tried supermaven? It replaced copilot for me a couple of months ago.

[+] thot_experiment|1 year ago|reply

Does anyone have a favorite FIM capable model? I've been using codellama-13b through ollama w/ a vim extension i wrote and it's okay but not amazing, I definitely get better code most of the time out of Gemma-27b but no FIM (and for some reason codellama-34b has broken inference for me)

[+] trissi|1 year ago|reply

I use deepseek-coder-7b-instruct-v1.5 & DeepSeek-Coder-V2-Lite-Instruct when I want speed & codestral-22B-v0.1 when I want smartness.

All of those are FIM capable, but especially deepseek-v2-lite is very picky with its prompt template so make sure you use it correctly...

Depending on your hardware codestral-22B might be fast enough for everything, but for me it's a bit to slow...

If you can run it deepseek v2 non-light is amazing, but it requires loads of VRAM

[+] xoranth|1 year ago|reply

Is the extension you wrote public?

[+] sa-code|1 year ago|reply

It's great to see a high-profile model using Mamba2!

[+] imjonse|1 year ago|reply

The MBPP column should bold DeepSeek as it has a better score than Codestral.

[+] smith7018|1 year ago|reply

Which means Codestral Mamba and DeepSeek both lead four benchmarks. Kinda takes the air out the announcement a bit.

[+] attentive|1 year ago|reply

codegeex4-all-9b beats them "on paper" so that's why it's not in the benchmarks.

[+] magnio|1 year ago|reply

They announce the model is on HuggingFace but don't link to it. Here it is: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

[+] dvfjsdhgfv|1 year ago|reply

The link is already there in the text, they probably just fixed it.

[+] flakiness|1 year ago|reply

So Mamba is supposed to be faster and the article claims that. But they don't have any latency numbers.

Has anyone tried this? And then, is it fast(er)?

[+] monkeydust|1 year ago|reply

Any recommended product primers to Mamba vs Transformers - pros/cons etc?

[+] red2awn|1 year ago|reply

A very good primer to state-space models (from which Mamba is based on) is The Annotated S4 [1]. If you want to dive into the code I wrote a minimal single-file implementation of Mamba-2 here [2].

[1]: https://srush.github.io/annotated-s4/

[2]: https://github.com/tommyip/mamba2-minimal

[+] bhouston|1 year ago|reply

This video is good: https://www.youtube.com/watch?v=N6Piou4oYx8. As are the other videos on the same YouTube account.

[+] flakiness|1 year ago|reply

For those who are text oriented: https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

The paper author has a blog series but I don't think it's for general public https://tridao.me/blog/2024/mamba2-part1-model/

[+] ertgbnm|1 year ago|reply

https://www.youtube.com/watch?v=X5F2X4tF9iM

This is what introduced me to them. May be a bit outdated at this point.

[+] modeless|1 year ago|reply

> Unlike Transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length

> We have tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens

Why only 256k tokens? Gemini's context window is 1 million or more and it's (probably) not even using Mamba.

[+] rileyphone|1 year ago|reply

Gemini is probably using ring attention. But scaling to that size requires more engineering effort in terms of interlink that goes beyond the purpose of this release from Mistral.

[+] tatsuya4|1 year ago|reply

Just did a quick test in the https://model.box playground, and it looks like the completion length is noticeably shorter than other models (e.g., gpt-4o). However, the response speed meets expectations..

[+] culopatin|1 year ago|reply

Does anyone have a video or written article that would get one up to speed with a bit of the history/progression and current products that are out there for one to try locally?

This is coming from someone that understands the general concepts of how LLMs work but only used the general publicly available tools like ChatGPT, Claude, etc.

I want to see if I have any hardware I can stress and run something locally, but don’t know where to start or even what are the available options.

[+] Kinrany|1 year ago|reply

Is there a good explanation of the Mamba architecture?

[+] alecco|1 year ago|reply

https://thegradient.pub/mamba-explained/

https://jackcook.com/2024/02/23/mamba.html

https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html

[+] altilunium|1 year ago|reply

https://x.com/bycloudai/status/1813311769047138568

[+] unknown|1 year ago|reply

[deleted]

[+] rjurney|1 year ago|reply

But I JUST switched from GPT4o to Claude! :( Kidding, but it isn't clear how to use this thing, as others have pointed out.

[+] ukuina|1 year ago|reply

What made you switch?

[+] zamalek|1 year ago|reply

Is this the active Codestral model on Le Chat? I got quite some mixed results from it tonight.

[+] localfirst|1 year ago|reply

any sort of evals on how it compares to closed models like chat gpt 4 or open ones like WizardLLM ?

[+] taf2|1 year ago|reply

How does this work in vim?

[+] kristianp|1 year ago|reply

Similarly, is there a way to use it with Kate or Sublime Text?

[+] pzo|1 year ago|reply

weird they compare to deepseek-coder v1.5 when we already have v2.0. Any advantage to use codestral mamba apart from that it's lighter in weights?

[+] sam_goldman_|1 year ago|reply

You can try this model out using OpenAI's API format with this TypeScript SDK: https://github.com/token-js/token.js

You just need a Mistral API key: https://console.mistral.ai/api-keys/

[+] croemer|1 year ago|reply

The first sentence is wrong. The website says:

> As a tribute to Cleopatra, whose glorious destiny ended in tragic snake circumstances

but according to Wikipedia this is not true:

> When Cleopatra learned that Octavian planned to bring her to his Roman triumphal procession, she killed herself by poisoning, contrary to the popular belief that she was bitten by an asp.

[+] skybrian|1 year ago|reply

Yes, that seems to be a myth, but exact circumstances seem rather uncertain according to the Wikipedia article [1]:

> [A]ccording to the Roman-era writers Strabo, Plutarch, and Cassius Dio, Cleopatra poisoned herself using either a toxic ointment or by introducing the poison with a sharp implement such as a hairpin. Modern scholars debate the validity of ancient reports involving snakebites as the cause of death and whether she was murdered. Some academics hypothesize that her Roman political rival Octavian forced her to kill herself in a manner of her choosing. The location of Cleopatra's tomb is unknown. It was recorded that Octavian allowed for her and her husband, the Roman politician and general Mark Antony, who stabbed himself with a sword, to be buried together properly.

I think this rounds to “nobody really knows.”

The “glorious destiny” seems kind of shaky, too. It’s just a throwaway line anyway.

[1] https://en.m.wikipedia.org/wiki/Death_of_Cleopatra

[+] ljsprague|1 year ago|reply

What bothers me more is that the legend is that she was killed by an asp, not a mamba.

138 comments