top | item 43136067

(no title)

The data set quality seems a really spotty based on looking a few random problems (I looked at about a dozen in the "Physics" subcategory). Several problems had no clear question (or answer) and seemed to be clipped from some longer resource and thus had back references to Sections and Chapters that the models clearly couldn't follow. Worse is that the verification of the answer seems to be via an LLM and not all that reliable; I saw several where the answer was marked correct when it clearly wasn't and several that were correct but not in the precise form given as "the" answer and thus were labelled as incorrect.

discuss

rosstaylor90|1 year ago

Thanks for feedback! Yes, we’re looking to improve quality in the coming months. Couple of notes:

- The initial use of data is distillation so we’re less bound by question quality (anything that evinces output diversity is good).

- But moving onto RL, we’ll need stronger quality. We have much better things planned both on data filtering and verification!

- Surprisingly, a lot of ML datasets actually look like this when you look under hood. We’re hoping having more eyeballs on it will help improve quality in long run over less transparent status quo!

eternityforest|1 year ago

I still don't understand why all the datasets have so many general knowledge questions and so much math, when so few people can do any of that stuff.

It makes sense for ASI research I suppose, but why are we trying to teach small models to do stuff almost no humans even try to do?

What happens if you train them with RAG context in the prompts and calculator calls in the CoT?