Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):
# prints the first ten prime numbers
def print_primes():
i = 2
num_printed = 0 # end of prompt
while num_printed < 10:
if is_prime(i):
print(i)
num_printed += 1
i += 1
def is_prime(n):
i = 2
while i * i <= n:
if n % i == 0:
return False
i += 1
return True
def main():
print_primes()
if __name__ == '__main__':
main()
It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.
> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.
Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.
Code llama Python is very interesting. Specifically tuned for Python.
I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.
That would be a crazy future thing! Putting machines truly to work..
I think this is called "mixture of experts" and also there's a lot of speculation that it's how GPT-4 works, although probably with just a few large models rather than many small ones.
I think more attempts havent been made because base llama is not that great at coding in general, relative to its other strengths, and stuff like Starcoder has flown under the radar.
Start with a CodeLlama for C, and start treating these systems as natural language compilers. C is low level enough and still readable for those rare moments
The best model, Unnatural Code Llama, is not released. Likely because it's trained on GPT4 based data, and might violate OpenAI TOS, because as per the "Unnatural" paper [1], the "unnatural" data is generated with the help of some LLM -- and you would want to use as good of an LLM as possible.
TheBloke doesn’t joke around [1]. I’m guessing we’ll have the quantized ones by the end of the day. I’m super excited to use the 34B Python 4 bit quantized one that should just fit on a 3090.
To run Code Llama locally, the 7B parameter quantized version can be downloaded and run with the open-source tool Ollama: https://github.com/jmorganca/ollama
ollama run codellama "write a python function to add two numbers"
More models coming soon (completion, python and more parameter counts)
>The Code Llama models provide stable generations with up to 100,000 tokens of context.
Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.
And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?
Copilot has been working great for me thus far, but it's limited by its interface. It seems like it only knows how to make predictions for the next bit of text.
Is anyone working on a code AI that can suggest refactorings?
"You should pull these lines into a function, it's repetitive"
"You should change this structure so it is easier to use"
How are people using these local code models? I would much prefer using these in-context in an editor, but most of them seem to be deployed just in an instruction context. There's a lot of value to not having to context switch, or have a conversation.
I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?
That’s just one perspective… Another perspective is that LLMs enable programmers to skip a lot of the routine and boring aspects of coding - looking up stuff, essentially - so they can focus on the fun parts that engage creativity.
This should be the only goal of mankind so we can smell the flowers instead of wasting our years in some cubicle. Some people will always want to work, but it shouldn't be the norm. What's the point really unless we're doing something we're passionate about? The economy?
The best interpretation of this is you mean eventually ML/AI will put programmers out of a job, and not Code LLama specifically.
However it is hard to tell how that might pan out. Can such an ML/AI do all the parts of the job effectively? A lot of non-coding skill bleed into the coder's job. For example talking to people who need an input to the task and finding out what they are really asking for, and beyond that, what the best solution is that solves the underlying problem of what they ask for, while meeting nonfunctional requirements such as performance, reliability, code complexity, and is a good fit for the business.
On the other hand eventually the end users of a lot of services might be bots. You are more likely to have a pricing.json than a pricing.html page, and bots discover the services they need from searches, negotiate deals, read contracts and sue each other etc.
Once the programming job (which is really a "technical problem solver" job) is replaced either it will just be same-but-different (like how most programmers use high level languages not C) or we have invented AGI that will take many other jobs.
In which case the "job" aspect of it is almost moot. Since we will be living in post-scarcity and you would need to figure out the "power" aspect and what it means to even be sentient/human.
I understand the fear of losing your job or becoming less relevant, but many of us love this work because we're passionate about technology, programming, science, and the whole world of possibilities that this makes... possible.
That's why we're so excited to see these extraordinary advances that I personally didn't think I'd see in my lifetime.
The fear is legitimate and I respect the opinions of those who oppose these advances because they have children to provide for and have worked a lifetime to get where they are. But at least in my case, the curiosity and excitement to see what will happen is far greater than my little personal garden.
Damn, we are living what we used to read in the most entertaining sci-fi literature!
(And that's not to say that I don't see the risks in all of this... in fact, I think there will be consequences far more serious than just "losing a job," but I could be wrong)
If we get to the point where these large language models can create complete applications and software solutions from design specs alone, then there's no reason to believe that this would be limited to merely replacing software devs.
It would likely impact a far larger swath of the engineering / design industry.
Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.
Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.
Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:
Looks like they left out another model though. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.
Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.
Between this, ideogram.ai (image generator which can spell, from former Google Imagen team member and others), and ChatGPT fine-tuning, this has been a truly epic week.
I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.
SDXL and DeepFloyd can spell. It's more or less just a matter of having a good enough text encoder.
I tried Ideogram yesterday and it felt too much like existing generators (base SD and Midjourney). DALLE2 actually has some interestingly different outputs, the problem is they never update it or fix the bad image quality.
Editor plugins are fantastic about completing based on a pattern. That's the main thing you're missing out on imo - it's worth it to hit tab, but not to copy/paste and say "finish this line for me, it looks almost like the one above."
There's also the real-time aspect where you can see that it's wrong via the virtual text, type a few characters, then it gets what you're doing and you can tab complete the rest of the line.
It's faster to converse with when you don't have to actually have a conversation, if that makes sense? The feedback loop is much shorter and doesn't require natural language, or nearly as much context switching.
Seriously, I was expecting to read the article and them be on a level on-par with GPT-4 or higher. For all this chat of how long Google/Facebook have been in the AI space longer than OpenAI, their products don't speak to that..
I can't wait for some models fine tuned on other languages. I'm not a Python developer, so I downloaded the 13B-instruct variant (4 bit quantized Q4_K_M) and it's pretty bad at doing javascript. I asked it to write me a basic React Native component that has a name prop and displays that name. Once it returned a regular React component, and when I asked it to make sure it uses React Native components, it said sure and outputted a bunch of random CSS and an HTML file that was initializing a React project.
It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.
Anyone know of a docker image that provides an HTTP API interface to Llama? I'm looking for a super simple sort of 'drop-in' solution like that which I can add to my web stack, to enable LLM in my web app.
This is great for asking questions like "how do I do x with y" and this code <<some code>> isn't working, whats wrong? Much faster that googling, or a great basis for forming a more accurate google search.
Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)
hallucinations can be resuces by incorporating 'retrieval automated generation' , RAG, on the front end. likely function library defs could be automagically entered as prompt/memory inputs.
Why wouldn’t they provide a hosted version? Seems like a no brainer… they have the money, the hardware, the bandwidth, the people to build support for it, and they could design the experience and gather more learning data about usage in the initial stages, while putting a dent in ChatGPT commercial prospects, and all while still letting others host and use it elsewhere. I don’t get it. Maybe it was just the fastest option?
[+] [-] daemonologist|2 years ago|reply
Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):
It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.[+] [-] redox99|2 years ago|reply
> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.
Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.
[+] [-] up6w6|2 years ago|reply
https://ai.meta.com/blog/code-llama-large-language-model-cod...
[+] [-] reacharavindh|2 years ago|reply
I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.
That would be a crazy future thing! Putting machines truly to work..
[+] [-] esperent|2 years ago|reply
[+] [-] brucethemoose2|2 years ago|reply
There was a similar attempt for Godot script trained a few months ago, and its reportedly pretty good:
https://github.com/minosvasilias/godot-dodo
I think more attempts havent been made because base llama is not that great at coding in general, relative to its other strengths, and stuff like Starcoder has flown under the radar.
[+] [-] bbor|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] seydor|2 years ago|reply
[+] [-] Palmik|2 years ago|reply
[1] https://arxiv.org/pdf/2212.09689.pdf
[+] [-] redox99|2 years ago|reply
[+] [-] syntaxing|2 years ago|reply
[1] https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16
[+] [-] mchiang|2 years ago|reply
`ollama run codellama:7b-instruct`
https://ollama.ai/blog/run-code-llama-locally
More models uploaded as we speak:
https://ollama.ai/library/codellama
[+] [-] stuckinhell|2 years ago|reply
[+] [-] UncleOxidant|2 years ago|reply
[+] [-] suyash|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] jmorgan|2 years ago|reply
[+] [-] benvolio|2 years ago|reply
Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.
And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?
[+] [-] lordnacho|2 years ago|reply
Is anyone working on a code AI that can suggest refactorings?
"You should pull these lines into a function, it's repetitive"
"You should change this structure so it is easier to use"
Etc
[+] [-] Draiken|2 years ago|reply
I absolutely love the idea of using one of these models without having to upload my source code to a tech giant.
[+] [-] scriptsmith|2 years ago|reply
I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?
[+] [-] mymac|2 years ago|reply
[+] [-] ttul|2 years ago|reply
[+] [-] worksonmine|2 years ago|reply
[+] [-] thewataccount|2 years ago|reply
From my experience with github copilot and GPT4 - developers are NOT going anywhere anytime soon. You'll certainly be faster though.
[+] [-] quickthrower2|2 years ago|reply
However it is hard to tell how that might pan out. Can such an ML/AI do all the parts of the job effectively? A lot of non-coding skill bleed into the coder's job. For example talking to people who need an input to the task and finding out what they are really asking for, and beyond that, what the best solution is that solves the underlying problem of what they ask for, while meeting nonfunctional requirements such as performance, reliability, code complexity, and is a good fit for the business.
On the other hand eventually the end users of a lot of services might be bots. You are more likely to have a pricing.json than a pricing.html page, and bots discover the services they need from searches, negotiate deals, read contracts and sue each other etc.
Once the programming job (which is really a "technical problem solver" job) is replaced either it will just be same-but-different (like how most programmers use high level languages not C) or we have invented AGI that will take many other jobs.
In which case the "job" aspect of it is almost moot. Since we will be living in post-scarcity and you would need to figure out the "power" aspect and what it means to even be sentient/human.
[+] [-] kbrannigan|2 years ago|reply
[+] [-] 037|2 years ago|reply
That's why we're so excited to see these extraordinary advances that I personally didn't think I'd see in my lifetime.
The fear is legitimate and I respect the opinions of those who oppose these advances because they have children to provide for and have worked a lifetime to get where they are. But at least in my case, the curiosity and excitement to see what will happen is far greater than my little personal garden. Damn, we are living what we used to read in the most entertaining sci-fi literature!
(And that's not to say that I don't see the risks in all of this... in fact, I think there will be consequences far more serious than just "losing a job," but I could be wrong)
[+] [-] yborg|2 years ago|reply
[+] [-] vunderba|2 years ago|reply
It would likely impact a far larger swath of the engineering / design industry.
[+] [-] modeless|2 years ago|reply
[+] [-] brucethemoose2|2 years ago|reply
https://huggingface.co/chargoddard/llama2-22b
Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.
Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:
https://huggingface.co/models?sort=modified&search=22b
[+] [-] nabakin|2 years ago|reply
Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.
[+] [-] redox99|2 years ago|reply
[+] [-] ilaksh|2 years ago|reply
I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.
[+] [-] astrange|2 years ago|reply
I tried Ideogram yesterday and it felt too much like existing generators (base SD and Midjourney). DALLE2 actually has some interestingly different outputs, the problem is they never update it or fix the bad image quality.
[+] [-] ShamelessC|2 years ago|reply
[+] [-] WhitneyLand|2 years ago|reply
I guess since Xcode doesn’t have a good plug-in architecture for this I began experimenting more with a chat interface.
So far gpt-4 has seemed quite useful for generating code, reviewing code for certain problems, etc.
[+] [-] citruscomputing|2 years ago|reply
There's also the real-time aspect where you can see that it's wrong via the virtual text, type a few characters, then it gets what you're doing and you can tab complete the rest of the line.
It's faster to converse with when you don't have to actually have a conversation, if that makes sense? The feedback loop is much shorter and doesn't require natural language, or nearly as much context switching.
[+] [-] 1024core|2 years ago|reply
[+] [-] rgbrgb|2 years ago|reply
[+] [-] binreaper|2 years ago|reply
[+] [-] gorbypark|2 years ago|reply
It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.
[+] [-] TheRealClay|2 years ago|reply
[+] [-] nodja|2 years ago|reply
[+] [-] KaiserPro|2 years ago|reply
Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)
[+] [-] SubiculumCode|2 years ago|reply
[+] [-] natch|2 years ago|reply
[+] [-] redox99|2 years ago|reply