top | item 47126261

(no title)

It's a good way to frame base models that have only been pretrained.

However, modern frontier models have undergone rounds of fine-tuning, RLHF (reinforcement learning from human feedback), and RLVR (RL from verifiable rewards) that turn them into something else. The compressed internet is still in there, but it's wrapped in problem-solving and people-pleasing circuitry.

discuss

No comments yet.