top | item 47205229

(no title)

DavidSJ | 14 hours ago

In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.

But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.

discuss

No comments yet.