top | item 42825375

(no title)

DeepSeek was built on the foundations of public research, a major part of which is the Llama family of models. Prior to Llama open weights LLMs were considerably less performant; without Llama we might not have gotten Mistral, Qwen, or DeepSeek. This isn't meant to diminish DeepSeek's contributions, however: they've been doing great work on mixture of experts models and really pushing the community forward on that front. And, obviously, they've achieved incredible performance.

Llama models are also still best in class for specific tasks that require local data processing. They also maintain positions in the top 25 of the lmarena leaderboard (for what that's worth these days with suspected gaming of the platform), which places them in competition with some of the best models in the world.

But, going back to my first point, Llama set the stage for almost all open weights models after. They spent millions on training runs whose artifacts will never see the light of day, testing theories that are too expensive for smaller players to contemplate exploring.

Pegging Llama as mediocre, or a waste of money (as implied elsewhere), feels incredibly myopic.

discuss

Philpax|1 year ago

As far as I know, Llama's architecture has always been quite conservative: it has not changed that much since LLaMA. Most of their recent gains have been in post-training.

That's not to say their work is unimpressive or not worthy - as you say, they've facilitated much of the open-source ecosystem and have been an enabling factor for many - but it's more that that work has been in making it accessible, not necessarily pushing the frontier of what's actually possible, and DeepSeek has shown us what's possible when you do the latter.

wiz21c|1 year ago

So at least Zuck had at least one good idea, useful for all of us !

lvl155|1 year ago

I never said Llama is mediocre. I said the teams they put together is full of people chasing money. And the billions Meta is burning is going straight to mediocrity. They’re bloated. And we know exactly why Meta is doing this and it’s not because they have some grand scheme to build up AI. It’s to keep these people away from their competition. Same with billions in GPU spend. They want to suck up resources away from competition. That’s their entire plan. Do you really think Zuck has any clue about AI? He was never serious and instead built wonky VR prototypes.

sangnoir|1 year ago

> And we know exactly why Meta is doing this and it’s not because they have some grand scheme to build up AI. It’s to keep these people away from their competition

I don't see how you can confidently say this when AI researchers and engineers are remunerated very well across the board and people are moving across companies all the time, if the plan is as you described it, it is clearly not working.

Zuckerberg seems confident they'll have an AI-equivalent of a mid-level engineer later this year, can you imagine how much money Meta can save by replacing a fraction of its (well-paid) engineers with fixed Capex + electric bill?

yodsanklai|1 year ago

> I said the teams they put together is full of people chasing money.

Does it mean they are mediocre? it's not like OpenAI or Anthropic pay their engineers peanuts. Competition is fierce to attract top talents.

oezi|1 year ago

In contrast to the Social Media industry (or word processors or mobile phones), the market for AI solutions seems not to have of an inherent moat or network effects which keep the users stuck in the market leader.

Rather with AI, capitalism seems working at its best with competitors to OpenAI building solutions which take market share and improve products. Zuck can try monopoly plays all day, but I don't think this will work this time.