top | item 40077533

Meta Llama 3

2199 points| bratao | 1 year ago |llama.meta.com

923 comments

order
[+] bbig|1 year ago|reply
They've got a console for it as well, https://www.meta.ai/

And announcing a lot of integration across the Meta product suite, https://about.fb.com/news/2024/04/meta-ai-assistant-built-wi...

Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I guess it's far from being a frontier model. We'll see how it fares in the LLM Arena.

[+] CuriouslyC|1 year ago|reply
They didn't compare against the best models because they were trying to do "in class" comparisons, and the 70B model is in the same class as Sonnet (which they do compare against) and GPT3.5 (which is much worse than sonnet). If they're beating sonnet that means they're going to be within stabbing distance of opus and gpt4 for most tasks, with the only major difference probably arising in extremely difficult reasoning benchmarks.

Since llama is open source, we're going to see fine tunes and LoRAs though, unlike opus.

[+] LrnByTeach|1 year ago|reply
Losers & Winners from Llama-3-400B Matching 'Claude 3 Opus' etc..

Losers:

- Nvidia Stock : lid on GPU growth in the coming year or two as "Nation states" use Llama-3/Llama-4 instead spending $$$ on GPU for own models, same goes with big corporations.

- OpenAI & Sam: hard to raise speculated $100 Billion, Given GPT-4/GPT-5 advances are visible now.

- Google : diminished AI superiority posture

Winners:

- AMD, intel: these companies can focus on Chips for AI Inference instead of falling behind Nvidia Training Superior GPUs

- Universities & rest of the world : can work on top of Llama-3

[+] nickthegreek|1 year ago|reply
And they even allow you to use it without logging in. Didnt expect that from Meta.
[+] josh-sematic|1 year ago|reply
They also stated that they are still training larger variants that will be more competitive:

> Our largest models are over 400B parameters and, while these models are still training, our team is excited about how they’re trending. Over the coming months, we’ll release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities.

[+] matsemann|1 year ago|reply
> Meta AI isn't available yet in your country

Where is it available? I got this in Norway.

[+] geepytee|1 year ago|reply
Also added Llama 3 70B to our coding copilot https://www.double.bot if anyone wants to try it for coding within their IDE and not just chat in the console
[+] dawnerd|1 year ago|reply
Tried a few queries and was surprised how fast it responded vs how slow chatgpt can be. Responses seemed just as good too.
[+] schleck8|1 year ago|reply
> Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I guess it's far from being a frontier model

Yeah, almost like comparing a 70b model with a 1.8 trillion parameter model doesn't make any sense when you have a 400b model pending release.

[+] dazuaz|1 year ago|reply
I'm based on LLaMA 2, which is a type of transformer language model developed by Meta AI. LLaMA 2 is a more advanced version of the original LLaMA model, with improved performance and capabilities. I'm a specific instance of LLaMA 2, trained on a massive dataset of text from the internet, books, and other sources, and fine-tuned for conversational AI applications. My knowledge cutoff is December 2022, and I'm constantly learning and improving with new updates and fine-tuning.
[+] jamesgpearce|1 year ago|reply
That realtime `/imagine` prompt seems pretty great.
[+] throwup238|1 year ago|reply
> And announcing a lot of integration across the Meta product suite, ...

That's ominous...

[+] krackers|1 year ago|reply
Are there an stats on if llama 3 beats out chatgpt 3.5 (the free one you can use)?
[+] typpo|1 year ago|reply
Public benchmarks are broadly indicative, but devs really should run custom benchmarks on their own use cases.

Replicate created a Llama 3 API [0] very quickly. This can be used to run simple benchmarks with promptfoo [1] comparing Llama 3 vs Mixtral, GPT, Claude, and others:

  prompts:
    - 'Answer this programming question concisely: {{ask}}'

  providers:
    - replicate:meta/meta-llama-3-8b-instruct
    - replicate:meta/meta-llama-3-70b-instruct
    - replicate:mistralai/mixtral-8x7b-instruct-v0.1
    - openai:chat:gpt-4-turbo
    - anthropic:messages:claude-3-opus-20240229

  tests:
    - vars:
        ask: Return the nth element of the Fibonacci sequence
    - vars:
        ask: Write pong in HTML
    # ...
Still testing things but Llama 3 8b is looking pretty good for my set of random programming qs at least.

Edit: ollama now supports Llama 3 8b, making it easy to run this eval locally.

  providers:
    - ollama:chat:llama3
[0] https://replicate.com/blog/run-llama-3-with-an-api

[1] https://github.com/typpo/promptfoo

[+] modeless|1 year ago|reply
Llama 3 70B has debuted on the famous LMSYS chatbot arena leaderboard at position number 5, tied with Claude 2 Sonnet, Bard (Gemini Pro), and Command R+, ahead of Claude 2 Haiku and older versions of GPT-4.

The score still has a large uncertainty so it will take a while to determine the exact ranking and things may change.

Llama 3 8B is at #12 tied with Claude 1, Mixtral 8x22B, and Qwen-1.5-72B.

These rankings seem very impressive to me, on the most trusted benchmark around! Check the latest updates at https://arena.lmsys.org/

Edit: On the English-only leaderboard Llama 3 70B is doing even better, hovering at the very top with GPT-4 and Claude Opus. Very impressive! People seem to be saying that Llama 3's safety tuning is much less severe than before so my speculation is that this is due to reduced refusal of prompts more than increased knowledge or reasoning, given the eval scores. But still, a real and useful improvement! At this rate, the 400B is practically guaranteed to dominate.

[+] nathanh4903|1 year ago|reply
I tried generating a Chinese rap song, and it did generate a pretty good rap. However, upon completion, it deleted the response, and showed > I don’t understand Chinese yet, but I’m working on it. I will send you a message when we can talk in Chinese.

I tried some other languages and the same. It will generate non-English language, but once its done, the response is deleted and replaced with the message

[+] hermesheet|1 year ago|reply
Lots of great details in the blog: https://ai.meta.com/blog/meta-llama-3/

Looks like there's a 400B version coming up that will be much better than GPT-4 and Claude Opus too. Decentralization and OSS for the win!

[+] eigenvalue|1 year ago|reply
I just want to express how grateful I am that Zuck and Yann and the rest of the Meta team have adopted an open approach and are sharing the model weights, the tokenizer, information about the training data, etc. They, more than anyone else, are responsible for the explosion of open research and improvement that has happened with things like llama.cpp that now allow you to run quite decent models locally on consumer hardware in a way that you can avoid any censorship or controls.

Not that I even want to make inference requests that would run afoul of the controls put in place by OpenAI and Anthropic (I mostly use it for coding stuff), but I hate the idea of this powerful technology being behind walls and having gate-keepers controlling how you can use it.

Obviously, there are plenty of people and companies out there that also believe in the open approach. But they don't have hundreds of billions of dollars of capital and billions in sustainable annual cash flow and literally ten(s) of billions of dollars worth of GPUs! So it's a lot more impactful when they do it. And it basically sets the ground rules for everyone else, so that Mistral now also feels compelled to release model weights for most of their models.

Anyway, Zuck didn't have to go this way. If Facebook were run by "professional" outside managers of the HBS/McKinsey ilk, I think it's quite unlikely that they would be this open with everything, especially after investing so much capital and energy into it. But I am very grateful that they are, and think we all benefit hugely from not only their willingness to be open and share, but also to not use pessimistic AI "doomerism" as an excuse to hide the crown jewels and put it behind a centralized API with a gatekeeper because of "AI safety risks." Thanks Zuck!

[+] modeless|1 year ago|reply
I was curious how the numbers compare to GPT-4 in the paid ChatGPT Plus, since they don't compare directly themselves.

           Llama 3 8B Llama 3 70B GPT-4
 MMLU      68.4       82.0        86.5
 GPQA      34.2       39.5        49.1
 MATH      30.0       50.4        72.2
 HumanEval 62.2       81.7        87.6
 DROP      58.4       79.7        85.4
Note that the free version of ChatGPT that most people use is based on GPT-3.5 which is much worse than GPT-4. I haven't found comprehensive eval numbers for the latest GPT-3.5, however I believe Llama 3 70B handily beats it and even the 8B is close. It's very exciting to have models this good that you can run locally and modify!

GPT-4 numbers from from https://github.com/openai/simple-evals gpt-4-turbo-2024-04-09 (chatgpt)

[+] bbig|1 year ago|reply
Zuck has an interview out for it as well, https://twitter.com/dwarkesh_sp/status/1780990840179187715
[+] paxys|1 year ago|reply
Very interesting part around 5 mins in where Zuck says that they bought a shit ton of H100 GPUs a few years ago to build the recommendation engine for Reels to compete with TikTok (2x what they needed at the time, just to be safe), and now they are accidentally one of the very few companies out there with enough GPU capacity to train LLMs at this scale.
[+] modeless|1 year ago|reply
Seems like a year or two of MMA has done way more for his charisma than whatever media training he's done over the years. He's a lot more natural in interviews now.
[+] chaoz_|1 year ago|reply
I can't express how good Dwarkesh's podcast is in general.
[+] minimaxir|1 year ago|reply
The model card has the benchmark results relative to other Llama models including Llama 2: https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md...

The dramatic performance increase of Llama 3 relative to Llama 2 (even Llama 2 13B!) is very impressive. Doubling the context window to 8k will open a lot of new oppertunities too.

[+] observationist|1 year ago|reply
https://github.com/meta-llama/llama3/blob/main/LICENSE

Llama is not open source. It's corporate freeware with some generous allowances.

Open source licenses are a well defined thing. Meta marketing saying otherwise doesn't mean they get to usurp the meaning of a well understood and commonly used understanding of the term "open source."

https://opensource.org/license

Nothing about Meta's license is open source. It's a carefully constructed legal agreement intended to prevent any meaningful encroachment by anyone, ever, into any potential Meta profit, and to disavow liability to prevent reputational harm in the case of someone using their freeware for something embarrassing.

If you use it against the license anyway, you'll just have to hope you never get successful enough that it becomes more profitable to sue you and take your product away than it would be annoying to prosecute you under their legal rights. When the threshold between annoying and profitable is crossed, Meta's lawyers will start sniping and acquiring users of their IP.

[+] jph00|1 year ago|reply
> "Nothing about Meta's license is open source. It's a carefully constructed legal agreement intended to prevent any meaningful encroachment by anyone, ever, into any potential Meta profit, and to disavow liability to prevent reputational harm in the case of someone using their freeware for something embarrassing."

You seem to be making claims that have little connection to the actual license.

The license states you can't use the model if, at the time Llama 3 was released, you had >700 million customers. It also says you can't use it for illegal/military/etc uses. Other than that, you can use it as you wish.

[+] bevekspldnw|1 year ago|reply
I don’t understand how the idea of open source become some sort of pseudo-legalistic purity test on everything.

Models aren’t code, some of the concepts of open source code don’t map 1:1 to freely available models.

In spirit I think this is “open source”, and I think that’s how the majority of people think.

Turning everything into some sort of theological debate takes away a lot of credit that Meta deserves. Google isn’t doing this. OpenAI sure as fuck isn’t.

[+] freehorse|1 year ago|reply
What are the practical use cases where the license prohibits people from using llama models? There are plenty of startups and companies that already build their business on llamas (eg phind.com). I do not see the issues that you assume exist.

If you get that successful that you cannot use it anymore (have 10% of earth's population as clients) probably you can train your own models already.

[+] CuriouslyC|1 year ago|reply
Models are mostly fungible, if meta decided to play games it's not too hard to switch models. I think this is mostly a CYA play.
[+] robertlagrant|1 year ago|reply
What is "source" regarding an LLM? Public training data and initial parameters?
[+] stale2002|1 year ago|reply
Yes or no, do you conceed that for almost everyone, none of what you said matters, and almost everyone can use llama 3 for their use case, and that basically nobody is going to have to worry about being sued, other than maybe like Google, or equivalent?

You are using all these scary words without saying the obvious, which is that for almost everyone, none of that matters.

[+] tarruda|1 year ago|reply
> When the threshold between annoying and profitable is crossed, Meta's lawyers will start sniping and acquiring users of their IP.

I'm curious: given that the model will probably be hosted in a private server, how would meta know or prove that someone is using their model against the license?

[+] KingOfCoders|1 year ago|reply
"Llama is not open source."

This is interesting. Can you point me to an OSI discussion what would constitute an open source license for LLMs? Obviously they have "source" (network definitions) and "training data" and "weights".

I'm not aware of any such discussion.

[+] doctoboggan|1 year ago|reply
I am always excited to see these Open Weight models released, I think its very good for the ecosystem and definitely has its place in many situations.

However since I use LLMs as a coding assistant (mostly via "rubber duck" debugging and new library exploration) I really don't want to use anything other than the absolutely best in class available now. That continues to be GPT4-turbo (or maybe Claude 3).

Does anyone know if there is any model out there that can be run locally and compete with GPT4-turbo? Or am I asking for something that is impossible?

[+] pellucide|1 year ago|reply
From the article

>We made several new observations on scaling behavior during the development of Llama 3. For example, while the Chinchilla-optimal amount of training compute for an 8B parameter model corresponds to ~200B tokens, we found that model performance continues to improve even after the model is trained on two orders of magnitude more data. Both our 8B and 70B parameter models continued to improve log-linearly after we trained them on up to 15T tokens. Larger models can match the performance of these smaller models with less training compute, but smaller models are generally preferred because they are much more efficient during inference.

Can someone experienced please explain this. Does this mean, a lean model with more training time and/or more (or better) training data will perform better than a fat model?

[+] nojvek|1 year ago|reply
I'm a big fan of various AI companies taking different approaches. OpenAI keeping it close to their hearts but have great developer apis. Meta and Mistral going open weights + open code. Anthropic and Claude doing their thing.

Competition is a beautiful thing.

I am half excited and half scared that AGI is our generation's space war.

I hope we can solve the big human problems, instead of more scammy ads and videos.

So far AI has been more hype than substance.

[+] aussieguy1234|1 year ago|reply
"You’ll also soon be able to test multimodal Meta AI on our Ray-Ban Meta smart glasses."

Now this is interesting. I've been thinking for some time now that traditional computer/smartphone interfaces are on the way out for all but a few niche applications.

Instead, everyone will have their own AI assistant, which you'll interact with naturally the same way as you interact with other people. Need something visual? Just ask for the latest stock graph for MSFT for example.

We'll still need traditional interfaces for some things like programming, industrial control systems etc...

[+] buildbot|1 year ago|reply
Quick thoughts -

Major arch changes are not that major, mostly GQA and tokenizer improvements. Tokenizer improvement is a under-explored domain IMO.

15T tokens is a ton!

400B model performance looks great, can’t wait for that to be released. Might be time to invest in a Mac studio!

OpenAI probably needs to release GPT-5 soon to convince people they are still staying ahead.

[+] mmoskal|1 year ago|reply
Interesting, the 8B model was trained for 1.3M hours, while the 70B for 6.4M hours at 700W. Assuming $0.05/kWh (WA price) it's $46k and $224k. Even allowing for cooling, CPUs, and more expensive power wherever they are running this, still well less than $1M in power. I somehow thought it would be much more.

The nVidia bill is another matter - assuming 5 year amortization and $45k H100, it works out $1/h, so $8M or so.

[+] seveibar|1 year ago|reply
Just a quick observation: it seems to not mention commercial companies (or at least be biased against it). I tried executing "what are popular design tools with an infinite canvas" against both meta.ai and OpenAI. OpenAI returned what you would expect, Figma Sketch etc. But MetaAI only returned free/open-source software https://x.com/seveibar/status/1781042926430437404
[+] yogorenapan|1 year ago|reply
I actually like that. I know they aren’t the “best” responses but as defaults, I would be more suspicious if it gave paid tools. I’m tested it and you can just ask for commercial tools if you want
[+] kyle_grove|1 year ago|reply
Interesting, I'm playing with it and I asked it what SIEMs are and it gave examples of companies/solutions, including Splunk and RSA Security Analytics.
[+] sergiotapia|1 year ago|reply
The amount of open source stuff Facebook (Meta) has given us over the years is astounding. pytorch, dall-e, react, react native, graphql, cassandra, tor. Commoditized VR, love my quest 3. Just an incredible track record. We're lucky they release all this stuff for free. The Zuck is one of a kind.