Helpful. I was thinking today about when it makes sense to fine tune vs use embeddings to feed into the LLM prompt and this helped solidify my understanding.
Except that the article didn't cover that distinction at all. It looked at (manual) prompt engineering vs fine tuning. What you are describing is Retrieval Augmented Generation (RAG) which is creating embeddings from a knowledgebase, doing a similarity search using an embedding of the search query, and then programmatically generating a prompt from the search query and the returned content. IMO, this design pattern should be preferred to fine tuning in the vast majority of use cases. Fine tuning should be used to get the model to perform new tasks; RAG should be used instead to add knowledge.
Realistically this seems like a question that would be difficult to generalize an answer to without measuring it. Intuition is unlikely to yield a better result than actually trying it.
DebtDeflation|2 years ago
joshka|2 years ago