top | item 39272558

(no title)

Your sentiment is correct, but it's more of a spectrum. Fine tuning can learn facts (otherwise how would the foundation models learn facts?). But it needs those facts in the training dataset. If you have an infinite amount of facts, then you can memorise all of them.

The challenge arises when it becomes hard to generate that training data. If you just have the raw text and pop that in the context (i.e. RAG), then the LLM can be just as factual without any of that hassle.

Q2: identifiers in the prompt to say "you've been trained on this, only answer questions about this".

Q3: Depends on the size of the training data/docs. For the average PDF, about 30 minutes.

Give it a try!

discuss

gpderetta|2 years ago

> If you have an infinite amount of facts, then you can memorise all of them

pigeon-hole?

gdiamos|2 years ago

Not literally infinite, but Llama2 scale models can handle about 10 trillion tokens.