(no title)
lllllm | 5 months ago
the full collection of models is here: https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...
PS: you can run this locally on your mac with this one-liner:
pip install mlx-lm
mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"
trickstra|5 months ago
lllllm|5 months ago
trcf22|5 months ago
menaerus|5 months ago
> Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run.