Yes but: if the commitment is driven by internal researchers and coders standing firm about making their work open source (a rumour I’ve heard a couple times), most of the credit goes to them.
Redistributable and free to use weights does not make a model open source (even if it's really nice, with very few people having access to that kind of training power).
I'm not very plugged into how to use these models, but I do love and pay for both ChatGPT and GitHub Copilot. How does one take a model like this (or a smaller version) and leverage it in VS Code? There's a dizzying array of GPT wrapper extensions for VS Code, many of which either seem like kind of junk (10 d/ls, no updates in a year), or just lead to another paid plan, at which point I might as well just keep my GH Copilot. Curious what others are doing here for Copilot-esque code completion without Copilot.
Curious what's the current SOTA local copilot model? Are there any extensions in vscode that give you a similar experience? I'd love something more powerful than copilot for local use (I have a 4090, so I should be able to run a decent number of models).
This is a completely fair, but open question. Not to be a typical HN user, but when you say SOTA local, the question is really what benchmarks do you really care about in order to evaluate. Size, operability, complexity, explainability etc.
Working out what copilot models perform best has been a deep exercise for myself and has really made me evaluate my own coding style on what I find important and things I look out for when investigating models and evaluating interview candidates.
I think three benchmarks & leaderboards most go to are:
https://evalplus.github.io/leaderboard.html - which I think is a better take on comparing models you intend to run locally as you can evaluate performance, operability and size in one visualisation.
Best of luck and I would love to know which models & benchmarks you choose and why.
When this 70b model gets quantized you should be able to run it fine on your 4090. Check out 'TheBloke' on huggingface and the llamacpp to run the gguf files.
This looks potentially interesting if it can be ran locally on say, an M2 Max or similar - and if there’s an IDE plugin to do the Copilot thing.
Anything that saves me time writing “boilerplate” or figuring out the boring problems on projects is welcome - so I can expend the organic compute cycles on solving the more difficult software engineering tasks :)
It's aimed at OpenAI's moat. Making sure they don't accumulate too much of one. No one actually has to use this, it just needs to be clear that LLM as a service won't be super high margin because competition can simply start building on Meta's open source releases.
This is targeted towards GPU rental services like RunPod as well as API providers such as together AI. Together.ai is charging $0.90/1M tokens at 70B parameters. https://www.together.ai/pricing
There are companies like phind that offer copilot-like services using finetuned versions of CodeLlama-34B, which imo are actually good. But I do not know if such a larger model is gonna be used in such a context.
Meta doesn't have an AI "product" competing with OpenAI, Google's Bard, etc. But they use AI extensively internally. This is roughly a byproduct of their internal AI work that they're already doing, and fostering open source AI development puts incredible pressure on the AI products and their owners.
If Meta can help prevent there from being an AI monopoly company, but rather an ecosystem of comparable products, then they avoid having another threatening tech giant competitor, as well as preventing their own AI work and products from being devalued.
They're commoditizing the ability to generate viral content, which is the carrot that keeps peoples' eyeballs on the hedonic treadmill. More eyeball-time = more ad placements = more money.
On the advertiser side, they're commoditizing the ability for companies to write more persuasively-targeted ads. Higher click-through rates = more money.
[edit]: For models that generate code instead of content (TFA), it's obviously a different story. I don't have a good grip on that story, beyond "they're using their otherwise-idle GPU farms to buy goodwill and innovate on training methods".
AI seems like the Next Big Thing. Meta have put themselves at the center of the most exciting growth area in technology by releasing models they have trained.
They've gained an incredible amount of influence and mindshare.
If they hadn't opened the models the llama series would just be a few sub-GPT4 models. Opening the models has created a wealth of development that has built upon those models.
Alone, it was unlikely they would become a major player in a field that might be massively important. With a large community building upon their base they have a chance to influence the direction of development and possibly prevent a proprietary monopoly in the hands of another company.
My opinion is Meta is taking the model out of the secret sauce formula. That leaves hardware and data for training as the barrier to entry. If you don't need to develop your own model then all you need is data and hardware which lowers the barrier to entry. The lower the barrier the more GenAI startups and the more potential data customers for Meta since they certainly have large, curated, datasets for sale.
I think a big part of it is just because they have a big AI lab. I don't know the genesis of that, but it has for years been a big contributor, see pytorch, models like SEER, as well as being one of the dominant publishers at big conferences.
Maybe now their leadership wants to push for practicality so they don't end up like Google (also a research powerhouse but failing to convert to popular advances) so they are publicly pushing strong LLMs.
Meta's end goal is to have better AI than everyone else, in the medium term that means they want to have the best foundational models. How does this help.
1. They become an attractive place for AI researchers to work, and can bring in better staff.
2. They make it less appealing for startups to enter the space and build large foundation models (Meta would prefer 1,000 startups pop up and play around with other people's models, than 1000 startups popping up and trying to build better foundational models).
3. They put cost pressure on AI as a service providers. When LLAMA exists it's harder for companies to make a profit just selling access to models. Along with 2 this further limits the possibility of startups entering the foundational model space, because the path to monetization/breakeven is more difficult.
Essentially this puts Meta, Google, and OpenAI/Microsoft (Anthropic/Amazon as a number four maybe) as the only real players in the cutting edge foundational model space. Worst case scenario they maintain their place in the current tech hegemony as newcomers are blocked from competing.
Aside from the "positive" explanations offered in the sibling comments, there's also a "negative" one: other AI companies that try to enter the fray will not be able to compete with Meta's open offerings. After all, why would you pay a company to undertake R&D on building their own models when you can just finetune a Llama?
Facebook went all in on the metaverse and turned into Meta; quite rightly, the market looked at what they produced for 10's of billions and decided their company was worthless.
Then Ai sprung to the front pages and any CEO who stood up and said "Ai" was rewarded with a 10x stock price. The unloved stepchild that was the ML team became the A team and the metaverse team have been sent to the naughty step. Facebook/Meta have no actual customer facing use for Ai unlike Microsoft/Google/GitHub but they like a good stonk price rise and so what we see is their stategy to stay in the ai game and relevant.
It turns out it is pretty good for the rest of us (possibly the first time facebook has given something positive to humanity) as we get shinny toys to play with.
Part of it is that they already had this developed for years (see alt text on uploaded images for example), and they want to ensure that new regulations don't hamper any of their future plans.
It costs them nothing to open it up, so why not. Kinda like all the rest of their GitHub repos.
Meta still sit on all the juicy user data that they want to use AI on but they don’t know how. They are crowdsourcing development of applications and tooling.
Meta releases model. Joe builds a cool app with it, earns some internet points and if lucky a few hundred bucks. Meta copies app, multiply Joes success story with 1 billion users and earn a few million bucks.
Meta sees this as the way to improve their AI offerings faster than others and, eventually, better than others.
Instead of a small group of engineers working on this inside Meta, the Open Source community helps improve it.
They have a history of this with React, PyTorch, hhvm, etc. All these have gotten better as OS projects faster than Meta alone would have been able to do.
Any good resources or suggestions for system/pre-prompt for general coding or when targeting a specific language? ie, using the CodeLlama and working on typescript, ruby, rust, elixir etc.. is there a universal prompt that gives good results or would you want to adjust the prompt depending on the language you're targeting?
Can anyone tell me what kind of hardware setup would be needed to fine-tune something like this? Would you need a cluster of GPUs? What kind of size + GPU spec would you think is reasonable (e.g. wrt VRAM per GPU etc).
Can you explain why big tech company make a race to release an open source model?
If model is free and open source then how will they earn and how will they compete with others?
[+] [-] chrishare|2 years ago|reply
[+] [-] joshspankit|2 years ago|reply
[+] [-] mvkel|2 years ago|reply
Not to diminish the value of the contribution, but "commitment" is an interesting word choice.
[+] [-] Aissen|2 years ago|reply
[+] [-] satvikpendem|2 years ago|reply
[+] [-] simonw|2 years ago|reply
[+] [-] israrkhan|2 years ago|reply
[+] [-] LVB|2 years ago|reply
[+] [-] turnsout|2 years ago|reply
[+] [-] martingoodson|2 years ago|reply
I highly recommend watching it.
[+] [-] pandominium|2 years ago|reply
I think Copilot is already highly subsidized by Microsoft.
Let's say you use Copilot around 30% of your daily work hours. How much kWh does an opensource 7B or 13B model use then in a month on one 4090?
EDIT:
I think for a 13B at 30% use per day it comes around 30$/no on energy bill.
So probably with a even more smaller but capable model can beat the Copilot monthly subscription.
[+] [-] theLiminator|2 years ago|reply
[+] [-] sfsylvester|2 years ago|reply
Working out what copilot models perform best has been a deep exercise for myself and has really made me evaluate my own coding style on what I find important and things I look out for when investigating models and evaluating interview candidates.
I think three benchmarks & leaderboards most go to are:
https://huggingface.co/spaces/bigcode/bigcode-models-leaderb... - which is the most understood, broad language capability leaderboad that relies on well understood evaluations and benchmarks.
https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul... - Also comprehensive, but primarily assesses Python and JavaScript.
https://evalplus.github.io/leaderboard.html - which I think is a better take on comparing models you intend to run locally as you can evaluate performance, operability and size in one visualisation.
Best of luck and I would love to know which models & benchmarks you choose and why.
[+] [-] Eisenstein|2 years ago|reply
[+] [-] siilats|2 years ago|reply
[+] [-] fullspectrumdev|2 years ago|reply
Anything that saves me time writing “boilerplate” or figuring out the boring problems on projects is welcome - so I can expend the organic compute cycles on solving the more difficult software engineering tasks :)
[+] [-] Havoc|2 years ago|reply
Cool nonetheless
[+] [-] svara|2 years ago|reply
[+] [-] connorgutman|2 years ago|reply
[+] [-] moyix|2 years ago|reply
[+] [-] ttul|2 years ago|reply
[+] [-] kungfupawnda|2 years ago|reply
[+] [-] blackoil|2 years ago|reply
[+] [-] dimask|2 years ago|reply
[+] [-] oceanplexian|2 years ago|reply
[+] [-] Spivak|2 years ago|reply
[+] [-] bk146|2 years ago|reply
(Please don't say "commoditize your complement" without explaining what exactly they're commoditizing...)
[+] [-] pchristensen|2 years ago|reply
If Meta can help prevent there from being an AI monopoly company, but rather an ecosystem of comparable products, then they avoid having another threatening tech giant competitor, as well as preventing their own AI work and products from being devalued.
Think of it like Google releasing a web browser.
[+] [-] gen220|2 years ago|reply
On the advertiser side, they're commoditizing the ability for companies to write more persuasively-targeted ads. Higher click-through rates = more money.
[edit]: For models that generate code instead of content (TFA), it's obviously a different story. I don't have a good grip on that story, beyond "they're using their otherwise-idle GPU farms to buy goodwill and innovate on training methods".
[+] [-] simonw|2 years ago|reply
They've gained an incredible amount of influence and mindshare.
[+] [-] eurekin|2 years ago|reply
[+] [-] Lerc|2 years ago|reply
Alone, it was unlikely they would become a major player in a field that might be massively important. With a large community building upon their base they have a chance to influence the direction of development and possibly prevent a proprietary monopoly in the hands of another company.
[+] [-] chasd00|2 years ago|reply
[+] [-] andy99|2 years ago|reply
Maybe now their leadership wants to push for practicality so they don't end up like Google (also a research powerhouse but failing to convert to popular advances) so they are publicly pushing strong LLMs.
[+] [-] crowcroft|2 years ago|reply
1. They become an attractive place for AI researchers to work, and can bring in better staff. 2. They make it less appealing for startups to enter the space and build large foundation models (Meta would prefer 1,000 startups pop up and play around with other people's models, than 1000 startups popping up and trying to build better foundational models). 3. They put cost pressure on AI as a service providers. When LLAMA exists it's harder for companies to make a profit just selling access to models. Along with 2 this further limits the possibility of startups entering the foundational model space, because the path to monetization/breakeven is more difficult.
Essentially this puts Meta, Google, and OpenAI/Microsoft (Anthropic/Amazon as a number four maybe) as the only real players in the cutting edge foundational model space. Worst case scenario they maintain their place in the current tech hegemony as newcomers are blocked from competing.
[+] [-] Philpax|2 years ago|reply
[+] [-] blitzar|2 years ago|reply
Then Ai sprung to the front pages and any CEO who stood up and said "Ai" was rewarded with a 10x stock price. The unloved stepchild that was the ML team became the A team and the metaverse team have been sent to the naughty step. Facebook/Meta have no actual customer facing use for Ai unlike Microsoft/Google/GitHub but they like a good stonk price rise and so what we see is their stategy to stay in the ai game and relevant.
It turns out it is pretty good for the rest of us (possibly the first time facebook has given something positive to humanity) as we get shinny toys to play with.
[+] [-] bryan_w|2 years ago|reply
It costs them nothing to open it up, so why not. Kinda like all the rest of their GitHub repos.
[+] [-] Too|2 years ago|reply
Meta releases model. Joe builds a cool app with it, earns some internet points and if lucky a few hundred bucks. Meta copies app, multiply Joes success story with 1 billion users and earn a few million bucks.
Joe is happy, Meta is happy. Everybody is happy.
[+] [-] Calvin02|2 years ago|reply
Meta sees this as the way to improve their AI offerings faster than others and, eventually, better than others.
Instead of a small group of engineers working on this inside Meta, the Open Source community helps improve it.
They have a history of this with React, PyTorch, hhvm, etc. All these have gotten better as OS projects faster than Meta alone would have been able to do.
[+] [-] theGnuMe|2 years ago|reply
Essentially, you mitigate IP claims and reduce vendor dependency.
https://eightify.app/summary/technology-and-software/the-imp...
[+] [-] sidcool|2 years ago|reply
[+] [-] d_sc|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] anonymousDan|2 years ago|reply
[+] [-] d_sc|2 years ago|reply
[+] [-] ahmednazir|2 years ago|reply
[+] [-] doctoboggan|2 years ago|reply
[+] [-] ramshanker|2 years ago|reply
[+] [-] edweis|2 years ago|reply