top | item 42509470

(no title)

> How much VRAM and inference compute is required to run 3.1-70B vs 2-70B?

We aren’t trying to mindlessly consume the same VRAM as last year and hope costs magically drop. We are noticing that we can get last year’s mid-level performance on this year’s low-end model, leading to cost savings at that perf level. The same thing happens next year, leading to a drop in cost at any given perf level over time.

> For training. Not for inference. GPU prices remained about the same, give or take.

See:

https://epoch.ai/blog/trends-in-gpu-price-performance

We don’t care about the absolute price, is the cost per flop or cost per GB decreasing over time with each new GPU?

—-

If it isn’t clear why inference costs at any given performance level will drop given the points above, unfortunately I can’t help you further.

discuss

menaerus|1 year ago

We absolutely care about absolute costs. 70B model this year will cost as much as it will next year, unless Nvidia decides to lose their profits. The question is whether an inference cost is dropping down. And the answer is obviously no. I see that you're out of your depth so let's just stop here.