top | item 38616174

(no title)

duchenne | 2 years ago

> The training for Phi-2 took 14 days on 96 A100 GPUs

This would mean that it costs around ~30k USD to train.

If training an LLM becomes cheaper than buying a car, it could democratize AI a lot.

discuss

order

alecco|2 years ago

Note the model is trained on data generated by GPT-4. It's probably orders of magnitude more expensive to generate the data at current API prices.

The whole point of these papers is that training data quality is key.

I would much prefer for these companies to release the training data than the weights. But that will never happen.

"We speculate that the creation of synthetic datasets will become, in the near future, an important technical skill and a central topic of research in AI."

verdverm|2 years ago

This sounds like the methodology from "Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes"

i.e. master teaches apprentice or LLM trains SLM

https://arxiv.org/abs/2305.02301 (May '23)

IanCal|2 years ago

> Note the model is trained on data generated by GPT-4.

Is it? I couldn't find that in the page, and can't easily access the links. The previous paper used 1B tokens from GPT-3.5

> It's probably orders of magnitude more expensive to generate the data at current API prices.

If you're generating a billion tokens, you might do better with dedicated instances, iirc they used to say if you were doing more than a few hundred million a month dedicated things were cheaper.

Der_Einzige|2 years ago

Training lora's or other parameter efficient techniques to fine-tune LLMs can be done on a 3090 today for basically nothing.

eternauta3k|2 years ago

You don't need to train it again, Microsoft already did.

Unless you want to develop a new one, then you also need the team of researchers/engineers.