top | item 45144461

(no title)

lllllm | 5 months ago

martin here from the apertus team, happy to answer any questions if i can.

the full collection of models is here: https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...

PS: you can run this locally on your mac with this one-liner:

pip install mlx-lm

mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"

discuss

trickstra|5 months ago

Hi, your "truly open" model is "gated" on Huggingface, restricting downloads unless we agree to "hold you harmless" and share our contact info. Can you fix this please, either by removing the restriction, or removing the "truly open" claim?

lllllm|5 months ago

We hear you, nevertheless this is one of the very few open-weights and open-data LLMs, and the license is still very permissive (compare for example to Llama). Personally of course I'd like to remove the additional click, but the universities also have a say in this.

trcf22|5 months ago

Great job! Would it be possible to know what was the cost of training such a model?

menaerus|5 months ago

From their report:

> Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run.