top | item 40014559

(no title)

osanseviero | 1 year ago

Zephyr 141B is a Mixtral 8x22B fine-tune. Here are some interesting details

- Base model: Mixtral 8x22B, 8 experts, 141B total params, 35B activated params

- Fine-tuned with ORPO, a new alignment algorithm with no SFT step (hence much faster than DPO/PPO)

- Trained with 7K open data instances -> high-quality, synthetic, multi-turn

- Apache 2

Everything is open:

- Final Model: https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v...

- Base Model: https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1

- Fine-tune data: https://huggingface.co/datasets/argilla/distilabel-capybara-...

- Recipe/code to train the model: https://huggingface.co/datasets/argilla/distilabel-capybara-...

- Open-source inference engine: https://github.com/huggingface/text-generation-inference

- Open-source UI code https://github.com/huggingface/chat-ui

Have fun!

discuss

order

loudmax|1 year ago

I like that they say how the model was trained for 1.3 hours on 4 nodes of 8 x H100s. By my rough calculation, that should probably have cost around $100 or so. (At $2 per hour, x 8 gpus x 4 nodes). Not free, but pretty cheap in the scheme of things. At least, once you know what you're doing.