WingNews

raphaelj|2 years ago

Do we have estimates of the energy requirements for these models?

I just did some napkin math, looks like inference on a 30B model with a GTX 4090 should get you about 30 tokens/sec [1], or 100k tokens/hour.

Considering such systems consume about 1 kW, that's about 10 kWh/1M tokens.

Based on the current cost of electricity, I don't think anyone could get below 2 ~ 4 $ per 1M token for a 30B model.

[1] https://old.reddit.com/r/LocalLLaMA/comments/13j5cxf/how_man...

filterfiber|2 years ago

FWIW - I need to remeasure but - IIRC my system with a 4090 only uses ~500w (maybe up to 600w) during inference of LLMs, the LLMs have a lot harder time saturating the compute compared to stable diffusion I'm assuming because of the VRAM speed (and this is all on-card, nothing swapping from system memory). The 4090 itself only really used 300~400w most of the time because of this.

If you consider 600w for the entire system, that's only 6kWh/1M token, for me 6kWh @0.2USD/kWh is 1.2USD/1M tokens.

And that's without the power efficiency improvements that an H100 has over the 4090. So I think 2$/1M should be achievable once you combine the efficiencies of H100s+batching, etc. Since LLM's generally dwarf the network delay anyway, you could host in places like washington for dirt cheap prices (their residential prices are almost half of what I used for calculations)

jillesvangurp|2 years ago

Depends how and where you source your energy. If you invest in your own solar panels and batteries, all that energy is essentially fixed price (cost of the infrastructure) amortized over the lifetime of the setup (1-2 decades or so). Maybe you have some variable pricing on top for grid connectivity and use the grid as a fallback. But there's also the notion of selling excess energy back to the grid that offsets that.

So, 10kwh could be a lot less than what you cite. That's also how grid operators make money. They generate cheaply and sell with a nice margin. Prices are determined by the most expensive energy sources on the grid in some markets (coal, nuclear, etc.). So, that pricing doesn't reflect actual cost for renewables, which is typically a lot lower than that. Anyone consuming large amounts of energy will be looking to cut their cost. For data centers that typically means investing in energy generation, storage, and efficient hardware and cooling.

avereveard|2 years ago

Batching changes that equation a fair bit. Also these cards will not consume full power since llm are mostly limited by memory bandwidth and the processing part will get some idle time.

singhrac|2 years ago

Is $0.2-0.4/kWh a good estimate for price paid in a data center? That’s pretty expensive for energy, and I think vPPA prices at big data centers are much lower (I think 0.1 is a decent upper bound in the US, though I could see EU being more expensive by 2x).

Filligree|2 years ago

The 4090 is considerably more power-hungry compared to e.g. an A100, however.

airgapstopgap|2 years ago

Mistral-small explicitly has inference costs of a 12.9b, but more than that, it's probably ran with batch size of 32 or higher. They'll worry more about offsetting training costs than about this.

Here's how it works in reality:

https://docs.mystic.ai/docs/mistral-ai-7b-vllm-fast-inferenc...

kaliqt|2 years ago

Well the 4090 is certainly less efficient on this. They are using H100's or better no doubt. If they optimize for TPUs, it'll be even better.

brandall10|2 years ago

I get 40 tok/sec on my M3 Max on various 34B models, I gather a desktop 4090 would be at least 80?

dcastm|2 years ago

If you take input tokens in consideration is more like 5.25 eur vs. 1.5 eur / million tokens overall.

Mistral-small seems to be the most direct competitor to gpt-3.5 and it’s cheaper (1.2 eur / million tokens)

Note: I’m assuming equal weight for input and output tokens, and cannot see the prices in USD :/

stavros|2 years ago

Does the 8x7B model really perform at a GPT-3.5 level? That means we might see GPT-3.5 models running locally on our phones in a few years.

infecto|2 years ago

I don’t think it’s safe to assume any of this. It’s still limited release which reads as invite only. Once it hits some kind of GA then we can test and verify.

raincole|2 years ago

It's safe to assume they are confident it's better than 3.5. But people can be confident and wrong.

raverbashing|2 years ago

Do they all use the same tokenizer? (I mean, Mistral vs GPT)

superkuh|2 years ago

No. Mistral uses sentencepiece and the GPT use tiktoken.

up6w6|2 years ago

I think the medium is trying to compete with Anthropic's Claude than Openai's products

https://www-files.anthropic.com/production/images/model_pric...

antifa|2 years ago

All they have to do to beat Anthropic's Claude is to skip having a permanent waitlist and let the credit cards get charged.

YetAnotherNick|2 years ago

> This suggests that they’re reasonably confident that the mistral-medium model is substantially better than gpt3-5

How did you reach the conclusion? Maybe they are counting on people paying extra just to prevent vendor lockdown.

antifa|2 years ago

The only vendor lock-in to GPT3.5 is the absence (perceived or real) of competitors at the same quality and availability.

epups|2 years ago

I understand how Mistral could end up being the most popular open source LLM model for the foreseeable future. What I cannot understand is who they expect to convince to pay for their API. As long as you are shipping your data to a third-party, whether they are running an open or closed source model is inconsequential.

chadash|2 years ago

I pay for hosted databases all the time. It’s more convenient. But those same databases are popular because they are open source.

I also know that because it’s open source, if I ever have a need to, I can host it on my own servers. Currently I don’t have that need, but it’s nice to know that it’s in the cards.

simonw|2 years ago

The big advantage of a hosted open model is insurance against model changes.

If you carefully craft and evaluate your more complex prompts against a closed model... and then that model is retired, you need to redo that process.

A lot of people were burned when OpenAI withdrew Codex, for example. I think that was a poor decision by OpenAI as it illustrated this exact risk.

If the hosted model you are using is open, you have options for continuing to use it should the host decide to stop offering it.

vidarh|2 years ago

You may be fine with shipping your data to OpenAI or Mistral, but worry about what happens if they change terms or if their future models change in a way that causes problems for you, or if they go bankrupt. In any of those cases, knowing you can take the model and run it yourself (or hire someone else to run it for you) mitigates risk. Whether those risks matter enough will of course differ wildly.

baq|2 years ago

Same reason why you would use GPT-4. Plenty of people pay for that, some pay really good money.

antifa|2 years ago

If I'm happy with my infrastructure being built on top of the potential energy of a loadbearing rugpull, I'd probably stick with OpenAI in the average use case.

code51|2 years ago

gpt-3.5 is heavily subsidized.

Mistral may just be aiming for a more sustainable price for the long run.

ipsum2|2 years ago

What's your evidence for that claim?

(no title)

discuss