top | item 47162472

(no title)

auraham | 4 days ago

I highly recommend Build a large language model from scratch [1] by Sebastian Raschka. It provides a clear explanation of the building blocks used in the first versions of ChatGPT (GPT 2 if I recall correctly). The output of the model is a huge vector of n elements, where n is the number of tokens in the vocabulary. We use that huge vector as a probability distribution to sample the next token given an input sequence (i.e., a prompt). Under the hood, the model has several building blocks like tokenization, skip connections, self attention, masking, etc. The author makes a great job explaining all the concepts. It is very useful to understand how LLMs works.

[1] https://www.manning.com/books/build-a-large-language-model-f...

discuss

order

phreeza|4 days ago

But this is missing exactly the gap which OP seems to have, which is going from a next token predictor (a language model in the classical sense) to an instruction finetuned, RLHF-ed and "harnessed" tool?