top | item 42213554

(no title)

keskival | 1 year ago

"I’m not sure, because OpenAI doesn’t deign to share gpt-4-base, nor to allow queries of gpt-4o in completion mode."

I would guess GPT-4o isn't first pre-trained and then instruct-tuned, but trained directly with refined instruction-following material.

This material probably contains way fewer chess games.

discuss

order

toxik|1 year ago

Why do you think that? InstructGPT was predominantly trained as a next-token predictor on whatever soup of data OpenAI curated at the time. The alignment signal (both RL part and the supervised prompt/answer pairs) are a tiny bit of the gradient.