top | item 44724887

(no title)

techwizrd | 7 months ago

We have been fine-tuning models using Axolotl and Unsloth, with a slight preference for Axolotl. Check out the docs [0] and fine-tune or quantize your first model. There is a lot to be learned in this space, but it's exciting.

0: https://axolotl.ai/ and https://docs.axolotl.ai/

discuss

arkmm|7 months ago

When do you think fine tuning is worth it over prompt engineering a base model?

I imagine with the finetunes you have to worry about self-hosting, model utilization, and then also retraining the model as new base models come out. I'm curious under what circumstances you've found that the benefits outweigh the downsides.

reissbaker|7 months ago

For self-hosting, there are a few companies that offer per-token pricing for LoRA finetunes (LoRAs are basically efficient-to-train, efficient-to-host finetunes) of certain base models:

- (shameless plug) My company, Synthetic, supports LoRAs for Llama 3.1 8b and 70b: https://synthetic.new All you need to do is give us the Hugging Face repo and we take care of the rest. If you want other people to try your model, we charge usage to them rather than to you. (We can also host full finetunes of anything vLLM supports, although we charge by GPU-minute for full finetunes rather than the cheaper per-token pricing for supported base model LoRAs.)

- Together.ai supports a slightly wider number of base models than we do, with a bit more config required, and any usage is charged to you.

- Fireworks does the same as Together, although they quantize the models more heavily (FP4 for the higher-end models). However, they support Llama 4, which is pretty nice although fairly resource-intensive to train.

If you have reasonably good data for your task, and your task is relatively "narrow" (i.e. find a specific kind of bug, rather than general-purpose coding; extract a specific kind of data from legal documents rather than general-purpose reasoning about social and legal matters; etc), finetunes of even a very small model like an 8b will typically outperform — by a pretty wide margin — even very large SOTA models while being a lot cheaper to run. For example, if you find yourself hand-coding heuristics to fix some problem you're seeing with an LLM's responses, it's probably more robust to just train a small model finetune on the data and have the finetuned model fix the issues rather than writing hardcoded heuristics. On the other hand, no amount of finetuning will make an 8b model a better general-purpose coding agent than Claude 4 Sonnet.

seunosewa|6 months ago

When prompt engineering isn't giving you reliable results.

tough|7 months ago

only for narrow applications where your fine tune can let you use a smaller model locally , specialised and trained for your specific use-case mostly

whimsicalism|7 months ago

finetuning rarely makes sense unless you are an enterprise and even generally doesn't in most cases there either.

syntaxing|7 months ago

What hardware do you train on using axolotl? I use unsloth with Google colab pro