top | item 42849018

(no title)

elijahbenizzy | 1 year ago

We’ve just learned that it’s possible to do AI on less compute (deepseek). if OpenAI doesn’t scale and that’s the problem then I’d argue that in the long run, if you believe in their ability to do research, then the news this week is a very bullish sign.

IMO the equivalent of moores law for AI (both on software and hardware development) is baked into the price, which doesn’t make the valuation all too crazy.

discuss

order

refulgentis|1 year ago

> We’ve just learned that it’s possible to do AI on less compute (deepseek).

There's a huge motte and bailey thing with DeepSeek conversation, where the bailey is "It only took $5.5 million!*" (* for exactly one training run for one of several models, at dirt-cheap per-hour spot prices for H100s) and the motte is all sorts of stuff.

Truth is one run for one model took 2048 GPUs fulltime for 2 months, and my experience with FAANG ML, that means it took 6 months part-time and another 1.5-2.5 runs went absolutely nowhere.

_heimdall|1 year ago

> is baked into the price, which doesn’t make the valuation all too crazy.

Valuations for most large companies have been crazy for a while now. No one values a company based on fundamentals anymore, its all pure gambling on future predictions.

This isn't unique to OpenAI by any means, but they are a good example. Last I checked their revenue to valuation multiplier was in the range of 42X. That's crazy.

Animats|1 year ago

Does anyone know how Deepseek does it yet?

drakenot|1 year ago

(Summary from Reddit)

- fp8 instead of fp32 precision training = 75% less memory

- multi-token prediction to vastly speed up token output

- Mixture of Experts (MoE) so that inference only uses parts of the model not the - entire model (~37B active at a time, not the entire 671B), increases efficiency

- PTX (basically low-level assembly code) hacking in old Nvidia GPUs to pump out as much performance from their old H800 GPUs as possible

Then, the big innovation of R1 and R1-Zero was finding a way to utilize reinforcement learning within their LLM training.

SlightlyLeftPad|1 year ago

Honestly, I’m not sure I’m completely sold on the value of LLMs long term but this is the most realistic and reasonable take I’ve read on this post so far.

If anything, it’s an downward adjustment in the cost implications but could actually unlock exponential improvements on a shorter time horizon than expected because of that. Investors getting scared probably is a good opportunity to buy in.

markvdb|1 year ago

Bullish on the use. Bearish on the profit margins for the big players.

If (big if!) I understand correctly, the ceiling for edge/local/offline AI has just blown off.

FeepingCreature|1 year ago

It's always been possible to "do (worse) AI on less compute". We've had years of open models! I also don't understand how anyone can see this as anything but good news for OpenAI. The ultimate value proposition of AI has always depended on whether it stretches to AGI and beyond, and R1 demonstrates that there's several orders of magnitude of hardware overhang. This makes it easier for OpenAI to succeed, not harder, because it makes it less likely that they'll scale to their financial limits and still fail to surpass humans.

addicted|1 year ago

The point is that this was developed outside of OpenAI.

So the real question is why does anyone believe that OpenAI will bring AGI when actual innovation was happening in some hedge fund in China while OpenAI was going on an international tour trying to drum up a trillion dollars.

physicsguy|1 year ago

The interesting part is that distillations based on reinforcement learning based models are performing so well. That brings the cost down dramatically to do certain tasks.