This is a thorough "how to" but it is missing a "why for" about any of the chosen starting elements.
I don't understand why you would use an old dataset that worked for llama2 and just fine-tune llama3 on it. Isn't it most likely that the new model has covered off everything it missed last time around and now the last dataset is only valuable for the last gen.
This might be an unfair statement but it really feels like all of these blogs don't know why. They copy/paste each other (you often seem the same errors in multiple notebooks/blogs) and I have a feeling no one really deeply understands what they're doing.
Thank you for saying this! The number of people that would need to fine tune vs just using RAG is really small. People that are not familiar with the source often jump to fine tuning as an option
Since the crypto (currency) craze of 2017, every time I hear "consumer GPU" somewhere in a story that has nothing to do with gaming, it sends a chill down my spine.
[+] [-] unraveller|1 year ago|reply
I don't understand why you would use an old dataset that worked for llama2 and just fine-tune llama3 on it. Isn't it most likely that the new model has covered off everything it missed last time around and now the last dataset is only valuable for the last gen.
[+] [-] factorymoo|1 year ago|reply
[+] [-] sa-code|1 year ago|reply
[+] [-] blackoil|1 year ago|reply
[+] [-] imjonse|1 year ago|reply
[+] [-] sumandas0|1 year ago|reply
[deleted]
[+] [-] iAkashPaul|1 year ago|reply
[+] [-] em1sar|1 year ago|reply
[deleted]
[+] [-] SunlitCat|1 year ago|reply
[+] [-] j0hnyl|1 year ago|reply