top | item 40299447

(no title)

dusanh | 1 year ago

Thank you so much! That sounds surprisingly straightforward. I expected a lot more fiddling to get going.

Where would I start if I wanted to use a model programmatically ? Like let's say I am building a chat bot. I have a large data set of replies I want the model to mimic, and I'd want to do this in Python. Of course, I'd probably use a different model than Granite.

discuss

everforward|1 year ago

This is stretching my own knowledge, so if someone else knowledgeable wants to take a stab here I would appreciate a response as well!

Before doing that, I would start basic. Pull llama3 and see what it does with your prompts. You may be surprised how much is already in there and just not need to involve your own data at all. If that doesn’t work, check HuggingFace to see if someone has already made a model/fine tune/LoRA for what you’re trying to do. There are many, eg I found a Magic The Gathering rules model the other day.

If those fails, or you just want to play with your own data, you’ll need to figure out what “mimic” means.

If the model does okay with generating content but the content is factually wrong or missing background, you may be able to just do RAG (retrieval augmented generation). Basically running your documents through an AI that converts them to embeddings (some kind of vector, I don’t understand how they work). Then when you run a query, you can search for related embeddings and pass them to the model so that it “knows” the content that was in the document. This is the easiest; open-webui (the Ollama web chat interface) has some RAG support. Danswer is open source and built from the ground up to do RAG, and has built in support for ingesting from Slack, Drive, etc, etc. OpenAI also has embedding as a service.

A step up from that is making a LoRA. To my novice eyes, LoRA’s are basically a diff of the models parameters or weights. So rather than training a whole new model, you just add deltas to an existing one. These let you “teach” the model something while preserving the base generation capabilities of the underlying model. Ie you won’t have to worry about making sure you feed it enough data that it can speak English properly, because it gets that from the base model, you only have to give it enough data to speak about whatever you’re training it on.

If that doesn’t make any sense, go check CivitAI for Stable Diffusion (image model) LoRAs. The effects are way more obvious on image AIs.

Anyways, LoRAs are trained so you’re into training there. I think HuggingFace has tools that make this easy, but I don’t know enough to say anything with confidence.

The last option, which you almost certainly don’t want, is to train a new base model like llama3. You’re starting from 0 there; you have no existing model so you will have to teach it everything. It will take a ton of data, it will take forever to train, and it will likely be much worse than even randomly clicking models on HuggingFace. Meta has spent who knows how much on Llama and it still hallucinates.

If you end up training, you’ll probably end up doing it in the cloud unless you have tons of VRAM doing nothing. Prices are pretty reasonable, I think A100s are around $2/hr. I don’t know how to gauge how long it needs to train, but I believe it’s related to the amount of data you’re training on. I believe it’s pretty reasonable for LoRAs though, I’m guesstimating in the $20-ish range?

Edit: oh, and I’m not affiliated in any way, but I found out last night that Fireworks’ new function calling model is free while it’s in beta, which is a neat/fun thing to play with. https://fireworks.ai/blog/firefunction-v1-gpt-4-level-functi... it’s also open weights if you want to run it locally, but it’s a 40B model so I can’t on my 3060

dusanh|1 year ago

Thank you again! This is definitely something to start from!