Code Llama, a state-of-the-art large language model for coding

[+] daemonologist|2 years ago|reply

Works nearly out of the box with llama.cpp, which makes it easy to try locally: https://github.com/ggerganov/llama.cpp/issues/2766

Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):

    # prints the first ten prime numbers 
    def print_primes(): 
        i = 2 
        num_printed = 0 # end of prompt
        while num_printed < 10:
            if is_prime(i):
                print(i)
                num_printed += 1
            i += 1

    def is_prime(n):
        i = 2
        while i * i <= n:
            if n % i == 0:
                return False
            i += 1
        return True

    def main():
        print_primes()

    if __name__ == '__main__':
        main()

It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.

[+] redox99|2 years ago|reply

The highlight IMO

> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.

[+] up6w6|2 years ago|reply

Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot

https://ai.meta.com/blog/code-llama-large-language-model-cod...

[+] reacharavindh|2 years ago|reply

Code llama Python is very interesting. Specifically tuned for Python.

I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.

That would be a crazy future thing! Putting machines truly to work..

[+] esperent|2 years ago|reply

I think this is called "mixture of experts" and also there's a lot of speculation that it's how GPT-4 works, although probably with just a few large models rather than many small ones.

[+] brucethemoose2|2 years ago|reply

If you can find a large body of good, permissively licensed example code, you can finetune an LLM on it!

There was a similar attempt for Godot script trained a few months ago, and its reportedly pretty good:

https://github.com/minosvasilias/godot-dodo

I think more attempts havent been made because base llama is not that great at coding in general, relative to its other strengths, and stuff like Starcoder has flown under the radar.

[+] bbor|2 years ago|reply

Mark my words: you’ve caught a glimpse of the near future :). Google “Society of Mind” if you’re not yet familiar

[+] unknown|2 years ago|reply

[deleted]

[+] seydor|2 years ago|reply

Start with a CodeLlama for C, and start treating these systems as natural language compilers. C is low level enough and still readable for those rare moments

[+] Palmik|2 years ago|reply

The best model, Unnatural Code Llama, is not released. Likely because it's trained on GPT4 based data, and might violate OpenAI TOS, because as per the "Unnatural" paper [1], the "unnatural" data is generated with the help of some LLM -- and you would want to use as good of an LLM as possible.

[1] https://arxiv.org/pdf/2212.09689.pdf

[+] redox99|2 years ago|reply

The good thing is that if it's only finetuned on 15k instructions, we should see a community made model like that very soon.

[+] syntaxing|2 years ago|reply

TheBloke doesn’t joke around [1]. I’m guessing we’ll have the quantized ones by the end of the day. I’m super excited to use the 34B Python 4 bit quantized one that should just fit on a 3090.

[1] https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16

[+] mchiang|2 years ago|reply

Ollama supports it already:

`ollama run codellama:7b-instruct`

https://ollama.ai/blog/run-code-llama-locally

More models uploaded as we speak:

https://ollama.ai/library/codellama

[+] stuckinhell|2 years ago|reply

What kind of cpu/gpu power do you need for quantization or these new gguf formats ?

[+] UncleOxidant|2 years ago|reply

If I don't want to run this locally is it runnable somewhere on huggingface?

[+] suyash|2 years ago|reply

can it be quantised further so it can run locally on a normal laptop of a developer?

[+] unknown|2 years ago|reply

[deleted]

[+] jmorgan|2 years ago|reply

To run Code Llama locally, the 7B parameter quantized version can be downloaded and run with the open-source tool Ollama: https://github.com/jmorganca/ollama

   ollama run codellama "write a python function to add two numbers"

More models coming soon (completion, python and more parameter counts)

[+] benvolio|2 years ago|reply

>The Code Llama models provide stable generations with up to 100,000 tokens of context.

Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.

And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?

[+] lordnacho|2 years ago|reply

Copilot has been working great for me thus far, but it's limited by its interface. It seems like it only knows how to make predictions for the next bit of text.

Is anyone working on a code AI that can suggest refactorings?

"You should pull these lines into a function, it's repetitive"

"You should change this structure so it is easier to use"

Etc

[+] Draiken|2 years ago|reply

As a complete noob at actually running these models, what kind of hardware are we talking here? Couldn't pick that up from the README.

I absolutely love the idea of using one of these models without having to upload my source code to a tech giant.

[+] scriptsmith|2 years ago|reply

How are people using these local code models? I would much prefer using these in-context in an editor, but most of them seem to be deployed just in an instruction context. There's a lot of value to not having to context switch, or have a conversation.

I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?

[+] mymac|2 years ago|reply

Never before in the history of mankind was a group so absolutely besotted with the idea of putting themselves out of a job.

[+] ttul|2 years ago|reply

That’s just one perspective… Another perspective is that LLMs enable programmers to skip a lot of the routine and boring aspects of coding - looking up stuff, essentially - so they can focus on the fun parts that engage creativity.

[+] worksonmine|2 years ago|reply

This should be the only goal of mankind so we can smell the flowers instead of wasting our years in some cubicle. Some people will always want to work, but it shouldn't be the norm. What's the point really unless we're doing something we're passionate about? The economy?

[+] thewataccount|2 years ago|reply

Is automation not what every engineer strives for when possible? Especially software developers.

From my experience with github copilot and GPT4 - developers are NOT going anywhere anytime soon. You'll certainly be faster though.

[+] quickthrower2|2 years ago|reply

The best interpretation of this is you mean eventually ML/AI will put programmers out of a job, and not Code LLama specifically.

However it is hard to tell how that might pan out. Can such an ML/AI do all the parts of the job effectively? A lot of non-coding skill bleed into the coder's job. For example talking to people who need an input to the task and finding out what they are really asking for, and beyond that, what the best solution is that solves the underlying problem of what they ask for, while meeting nonfunctional requirements such as performance, reliability, code complexity, and is a good fit for the business.

On the other hand eventually the end users of a lot of services might be bots. You are more likely to have a pricing.json than a pricing.html page, and bots discover the services they need from searches, negotiate deals, read contracts and sue each other etc.

Once the programming job (which is really a "technical problem solver" job) is replaced either it will just be same-but-different (like how most programmers use high level languages not C) or we have invented AGI that will take many other jobs.

In which case the "job" aspect of it is almost moot. Since we will be living in post-scarcity and you would need to figure out the "power" aspect and what it means to even be sentient/human.

[+] kbrannigan|2 years ago|reply

Do you really want to spend you days writing REDUX accumulators?

[+] 037|2 years ago|reply

I understand the fear of losing your job or becoming less relevant, but many of us love this work because we're passionate about technology, programming, science, and the whole world of possibilities that this makes... possible.

That's why we're so excited to see these extraordinary advances that I personally didn't think I'd see in my lifetime.

The fear is legitimate and I respect the opinions of those who oppose these advances because they have children to provide for and have worked a lifetime to get where they are. But at least in my case, the curiosity and excitement to see what will happen is far greater than my little personal garden. Damn, we are living what we used to read in the most entertaining sci-fi literature!

(And that's not to say that I don't see the risks in all of this... in fact, I think there will be consequences far more serious than just "losing a job," but I could be wrong)

[+] yborg|2 years ago|reply

When mechanized textile machinery was invented, the weavers that had jobs after their introduction were those that learned how to use them.

[+] vunderba|2 years ago|reply

If we get to the point where these large language models can create complete applications and software solutions from design specs alone, then there's no reason to believe that this would be limited to merely replacing software devs.

It would likely impact a far larger swath of the engineering / design industry.

[+] modeless|2 years ago|reply

Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.

[+] brucethemoose2|2 years ago|reply

Someone "grafted" llama 33B onto llama v2 13B to make "llama 22B"

https://huggingface.co/chargoddard/llama2-22b

Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.

Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:

https://huggingface.co/models?sort=modified&search=22b

[+] nabakin|2 years ago|reply

Looks like they left out another model though. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.

Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.

[+] redox99|2 years ago|reply

I can't imagine it being better than Llama1 33B, after all this code finetuning.

[+] ilaksh|2 years ago|reply

Between this, ideogram.ai (image generator which can spell, from former Google Imagen team member and others), and ChatGPT fine-tuning, this has been a truly epic week.

I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.

[+] astrange|2 years ago|reply

SDXL and DeepFloyd can spell. It's more or less just a matter of having a good enough text encoder.

I tried Ideogram yesterday and it felt too much like existing generators (base SD and Midjourney). DALLE2 actually has some interestingly different outputs, the problem is they never update it or fix the bad image quality.

[+] ShamelessC|2 years ago|reply

Did ideogram release a checkpoint?

[+] WhitneyLand|2 years ago|reply

How much am I’m missing out on with tools like this or code pilot, compared to using GPT-4?

I guess since Xcode doesn’t have a good plug-in architecture for this I began experimenting more with a chat interface.

So far gpt-4 has seemed quite useful for generating code, reviewing code for certain problems, etc.

[+] citruscomputing|2 years ago|reply

Editor plugins are fantastic about completing based on a pattern. That's the main thing you're missing out on imo - it's worth it to hit tab, but not to copy/paste and say "finish this line for me, it looks almost like the one above."

There's also the real-time aspect where you can see that it's wrong via the virtual text, type a few characters, then it gets what you're doing and you can tab complete the rest of the line.

It's faster to converse with when you don't have to actually have a conversation, if that makes sense? The feedback loop is much shorter and doesn't require natural language, or nearly as much context switching.

[+] 1024core|2 years ago|reply

If GPT-4's accuracy is 67% and this is 54%, how can these guys claim to be SOTA?

[+] rgbrgb|2 years ago|reply

This runs locally on a MacBook.

[+] binreaper|2 years ago|reply

Seriously, I was expecting to read the article and them be on a level on-par with GPT-4 or higher. For all this chat of how long Google/Facebook have been in the AI space longer than OpenAI, their products don't speak to that..

[+] gorbypark|2 years ago|reply

I can't wait for some models fine tuned on other languages. I'm not a Python developer, so I downloaded the 13B-instruct variant (4 bit quantized Q4_K_M) and it's pretty bad at doing javascript. I asked it to write me a basic React Native component that has a name prop and displays that name. Once it returned a regular React component, and when I asked it to make sure it uses React Native components, it said sure and outputted a bunch of random CSS and an HTML file that was initializing a React project.

It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.

[+] TheRealClay|2 years ago|reply

Anyone know of a docker image that provides an HTTP API interface to Llama? I'm looking for a super simple sort of 'drop-in' solution like that which I can add to my web stack, to enable LLM in my web app.

[+] nodja|2 years ago|reply

https://github.com/abetlen/llama-cpp-python has a web server mode that replicates openai's API iirc and the readme shows it has docker builds already.

[+] KaiserPro|2 years ago|reply

This is great for asking questions like "how do I do x with y" and this code <<some code>> isn't working, whats wrong? Much faster that googling, or a great basis for forming a more accurate google search.

Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)

[+] SubiculumCode|2 years ago|reply

hallucinations can be resuces by incorporating 'retrieval automated generation' , RAG, on the front end. likely function library defs could be automagically entered as prompt/memory inputs.

[+] natch|2 years ago|reply

Why wouldn’t they provide a hosted version? Seems like a no brainer… they have the money, the hardware, the bandwidth, the people to build support for it, and they could design the experience and gather more learning data about usage in the initial stages, while putting a dent in ChatGPT commercial prospects, and all while still letting others host and use it elsewhere. I don’t get it. Maybe it was just the fastest option?

[+] redox99|2 years ago|reply

Probably the researchers at meta are only interested in research, and productionizing this would be up to other teams.

501 comments