top | item 42328924

(no title)

dsmmcken | 1 year ago

I feel like I should be writing with the goal that the end reader is actually an LLM. The LLM will be the one spitting out the answers to the actual users via things like co-pilot, but I am not sure how that should change my approach to structure or level of detail in docs. Heavier on the number of code examples?

discuss

Mathnerd314|1 year ago

Well, look at the process of training a chatbot:

- first you make a "raw" corpus, with all the information needed to produce an answer

- then you generate sample question-answer pairs

- then you use AI to make better questions and better answers (look at e.g. WizardLM https://arxiv.org/pdf/2304.12244)

- can also finetune with RLHF or modify the Q-A pairs directly

- then you have a final model finetune once the Q-A pairs look good

- then you use RAG over the corpus and the Q-A pairs because the model doesn't remember all the facts

- then you have a bullshit detector to avoid hallucinations

So the corpus is very important, and the Q-A pairs are also important. I would say you've got to make the corpus by hand, or by very specific LLM prompts. And meanwhile you should be developing the Q-A pairs with LLMs as the project develops - this gives a good indication of what the LLM knows, what needs work, etc. When you have a good set of Q-A pairs you could probably publish it as a static website, save money on LLM generation costs if people don't need super-specific answers.

To add to the current top-scoring comment, though (https://news.ycombinator.com/item?id=42326324), one advantage of an LLM-based workflow is that the corpus is the single source of truth. It is true that good documentation repeats itself, but from a maintenance standpoint, changing all the occurrences of a fact, idea, etc. is hard work, whereas changing it once in the corpus and then regenerating all the QA pairs is straightforward.

pault|1 year ago

Ask an LLM to write it?