top | item 40188955

(no title)

The other recent improvement suggested for LoRA is DoRA: https://magazine.sebastianraschka.com/p/lora-and-dora-from-s.... It really does seem to strongly outperform LoRA - see also https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.htm...

discuss

josalhor|1 year ago

I just skimmed over LoRA+ and DoRA and I see no reason why these improvements could not go hand in hand. Actually, LoRA+ seems to be about efficient training while DoRA seems about improving the ability to actually learn, making it significantly more robust. Although I still have my questions on how the improvements of LoRA+ would be applied to the magnitude vector.

WithinReason|1 year ago

The two methods seem to be independent, wonder if you can combine them for even better performance.

Interestingly both seem to indirectly modify the optimisation process, in my opinion effectively trying to fix a bad optimiser. Seems like we still have a long way to go after Adam...

neodypsis|1 year ago

> Seems like we still have a long way to go after Adam...

A preprint in arxiv suggests that Adam works better than SGD for training LLMs due to the issue of class-imbalance [0]. It appears that scaling the gradient step helps with the training, for example, see another approach suggested in [1].

0. https://arxiv.org/pdf/2402.19449 1. https://arxiv.org/pdf/2402.02347

Ger_Onimo|1 year ago

I've just started playing with DoRAs for fine-tuning TTS models towards particular styles of speech, and they're working extremely well!

allpaca|1 year ago

Can you tell us more about it? Have you reported the results of your experiments in a post?

cooljoseph|1 year ago

Those blog posts are pretty bad. Just read the original paper, https://arxiv.org/pdf/2402.09353. The key section is 4.1.