top | item 38539323

(no title)

Related comment from gwern: https://news.ycombinator.com/item?id=38438859. Can't find the docs now - I think they were the old GPT 3 ones - but they suggested a low value somewhere around 0.01 and 0.1.

Also - why qlora rather than a full finetune? Using LambdaLabs, it'd cost roughly the same as your quote. Cheaper I think if you're willing to gamble with fp8: https://github.com/mosaicml/llm-foundry/tree/main/scripts/tr.... And fewer hyperparameters to tune as well

discuss

No comments yet.