top | item 35279656

How to use Alpaca-LoRA to fine-tune a model like ChatGPT

173 points| bfirsh | 3 years ago |replicate.com | reply

48 comments

order
[+] rishsriv|3 years ago|reply
This looks fantastic. Will try replacing our current fine-tuned FLAN-UL2 model with this.

I wonder how the devtooling around this will evolve. Seems like a matter of days until someone creates a GUI wrapper around this, and obviates the need to use programmer time for fine-tuning

[+] anymoonus|3 years ago|reply
I'm curious, what are the differences between T5, Flan-T5, and Flan-UL2 for fine-tuning? Does the instruction tuning matter at all, once you're fine-tuning?
[+] isoprophlex|3 years ago|reply
Low-rank adaptation (LoRA) ... has some advantages over previous methods:

- It is faster and uses less memory, which means it can run on consumer hardware.

- The output is much smaller (megabytes, not gigabytes).

- You can combine multiple fine-tuned models together at runtime.

This is great news for my dream of building a fine-tuned interactive messenger, that can deliver a message on my behalf by training it on my personality & the information I want to convey.

Now just add text to speech and a talking head, as discussed in that other submission about cloning yourself with AI... https://news.ycombinator.com/item?id=35280418

[+] elorant|3 years ago|reply
And then you hook it up to hundreds of dating apps and it just does the boring job of making introductory chat presenting you at the end with only the women who are interested in a real date.
[+] cal5k|3 years ago|reply
What if you died but your chatbot didn't know?
[+] camdenlock|3 years ago|reply
> The weights for LLaMA have not yet been released publicly. To apply for access, fill out this Meta Research form.

Cute. ;)

[+] k_eshav|3 years ago|reply
iirc someone posted the weights to a torrent. you can look it up. :)
[+] tysam_and|3 years ago|reply
LoRA has actually been around for a little while! I first saw it when it became popular in fine-tuning models quantized down to about 8 bits or so. I'm sure it's doing stuff in the 4bit range now! :D

I believe it's a core toolbox piece of tech required to really push the limits of LLMs either in original training or in inference. Similar sort of to how batch norm was for convolutional neural networks. I look forward to seeing how this will be applied in the future.

[+] mnreef|3 years ago|reply
Hi All, I have a noob question. I have been reading about Alpaca and Alpaca Lora. I have a use case in which I want to fine tune/train Alpaca Lora on a large corpus of books which are in the txt format. I know for Alpaca, the data was in "Instruction : Prompt" format. however, my text is huge and is not in that format. It's simply a library of books and journal articles. I want to be able to ask a question and the model answers based on the books I trained it on. I also want to be able to ask general questions for example which books discussed topic x or y.

I have tried OpenAI's API to create embeddings, but I want to use Alpaca.

I really appreciate your help.

[+] braingenious|3 years ago|reply
I love these idea of LoRAs for LLMs.

Has anybody made a llama/alpaca erebus model? I read about them in the oobabooga docs and a locally-run language model fine tuned on literotica could be the funniest thing I’ve ever seen.

[+] credit_guy|3 years ago|reply
I guess this LoRA is the missing piece.

NVIDIA stated recently that GPT bots will become one million times more powerful in ten years. Many people doubted that.

With LoRA, I see a much higher improvement. These guys claim a 10000 times reduction in parameter size. A different way to look at it, is that with the current hardware you can train a model that has 10000 times more parameters. If you add a 100x improvement in hardware in 10 years (not at all unrealistic), that's the million. But we will have significant improvements in training methods too.

[+] flangola7|3 years ago|reply
Where do you find 10,000 more data?
[+] nico|3 years ago|reply
Can a model be fine-tuned “online”?

If cost wasn’t an issue, could I fine-tune a model in real time, while also using it for inference?

[+] nl|3 years ago|reply
Yes and no.

The training process is modifying the network weights. These are usually written to copies of the file instead of overwriting it (because what if the loss is actually worse after an epoch of training?)

But there's nothing stopping inference from occurring on a model that is being trained.

[+] all2|3 years ago|reply
Stats from TFA say 3 hours to fine tune on an A100 processor.
[+] rcarmo|3 years ago|reply
So they use cog before installing it? Apparently this wasn’t proofread.

Also, is it just me or there are currently more ways to run LLMs on a CPU than on a GPU springing up on GitHub? I have hacked my own, but my chat UI is awful, so what is the nicest, pre-packaged CUDA-friendly way to run this now?

[+] eachro|3 years ago|reply
How does LoRA save more than 50% of the memory usage? I see that the weight updates have much lower memory footprint by virtue if being low rank. But you still need the dense weights for the forward pass dont you?
[+] leereeves|3 years ago|reply
I'm not an expert, but I believe it only saves memory in the final model, after training is done, by merging the low rank LoRA wrapper matrices with the original weight matrices.

For example, if an original layer has N inputs and outputs (an NxN weight matrix) LoRa adds a 16xN matrix before it and an Nx16 matrix after it, trains only those new matrices, and finally multiplies all three matrices to get a single 16x16 matrix.

[+] slicktux|3 years ago|reply
Anyone else click on this thinking it was about the wireless protocol?
[+] y3sar|3 years ago|reply
Both are fascinating tech
[+] techn00|3 years ago|reply
It feels like I'm living in a cartoon with all these terms: > In this blog post, we’ll show you how to use LoRA to fine-tune LLaMA using Alpaca training data.