(no title)
rrsp | 2 years ago
Pricing has been released too.
Per 1 million output tokens:
Mistral-medium $8
Mistral-small $1.94
gpt-3.5-turbo-1106 $2
gpt-4-1106-preview $30
gpt-4 $60
gpt-4-32k $120
This suggests that they’re reasonably confident that the mistral-medium model is substantially better than gpt3-5
raphaelj|2 years ago
I just did some napkin math, looks like inference on a 30B model with a GTX 4090 should get you about 30 tokens/sec [1], or 100k tokens/hour.
Considering such systems consume about 1 kW, that's about 10 kWh/1M tokens.
Based on the current cost of electricity, I don't think anyone could get below 2 ~ 4 $ per 1M token for a 30B model.
[1] https://old.reddit.com/r/LocalLLaMA/comments/13j5cxf/how_man...
filterfiber|2 years ago
If you consider 600w for the entire system, that's only 6kWh/1M token, for me 6kWh @0.2USD/kWh is 1.2USD/1M tokens.
And that's without the power efficiency improvements that an H100 has over the 4090. So I think 2$/1M should be achievable once you combine the efficiencies of H100s+batching, etc. Since LLM's generally dwarf the network delay anyway, you could host in places like washington for dirt cheap prices (their residential prices are almost half of what I used for calculations)
jillesvangurp|2 years ago
So, 10kwh could be a lot less than what you cite. That's also how grid operators make money. They generate cheaply and sell with a nice margin. Prices are determined by the most expensive energy sources on the grid in some markets (coal, nuclear, etc.). So, that pricing doesn't reflect actual cost for renewables, which is typically a lot lower than that. Anyone consuming large amounts of energy will be looking to cut their cost. For data centers that typically means investing in energy generation, storage, and efficient hardware and cooling.
avereveard|2 years ago
singhrac|2 years ago
Filligree|2 years ago
airgapstopgap|2 years ago
Here's how it works in reality:
https://docs.mystic.ai/docs/mistral-ai-7b-vllm-fast-inferenc...
kaliqt|2 years ago
brandall10|2 years ago
dcastm|2 years ago
Mistral-small seems to be the most direct competitor to gpt-3.5 and it’s cheaper (1.2 eur / million tokens)
Note: I’m assuming equal weight for input and output tokens, and cannot see the prices in USD :/
stavros|2 years ago
infecto|2 years ago
raincole|2 years ago
raverbashing|2 years ago
superkuh|2 years ago
up6w6|2 years ago
https://www-files.anthropic.com/production/images/model_pric...
antifa|2 years ago
YetAnotherNick|2 years ago
How did you reach the conclusion? Maybe they are counting on people paying extra just to prevent vendor lockdown.
antifa|2 years ago
epups|2 years ago
chadash|2 years ago
I also know that because it’s open source, if I ever have a need to, I can host it on my own servers. Currently I don’t have that need, but it’s nice to know that it’s in the cards.
simonw|2 years ago
If you carefully craft and evaluate your more complex prompts against a closed model... and then that model is retired, you need to redo that process.
A lot of people were burned when OpenAI withdrew Codex, for example. I think that was a poor decision by OpenAI as it illustrated this exact risk.
If the hosted model you are using is open, you have options for continuing to use it should the host decide to stop offering it.
vidarh|2 years ago
baq|2 years ago
antifa|2 years ago
code51|2 years ago
Mistral may just be aiming for a more sustainable price for the long run.
ipsum2|2 years ago