top | item 44966930

(no title)

rybosome | 6 months ago

No problem - although I'm out of that particular role, it's appropriate to discuss since the company shared these details already in an openAI press release a few months back.

I fine-tuned reasoning models (o1-mini and o3-mini) which were already well into instruction-following and reasoning behavior. The dataset I prepared was taking this into account, but it was just simple prompt/response pairs. Defining the task tightly, ensuring the dataset was of high quality, picking the right hyper parameters, and preparing the proper reward function (and modeling that against the API provided) were the keys to success.

discuss

rbanffy|6 months ago

That’s really cool. I’d love to see that process from up close.