We are thinking of something like this (a curriculum approach) for further training. The reason why we didn't want to do this for current work, where the emphasis is on evaluations, is that the "difficulty level" of different tasks is quite subjective and hence we would need to make arbitrary decisions that could affect the evals (i.e which tasks would follow which scenarios, how to ensure sufficient coverage across all difficulty levels etc)
infogulch|11 months ago
> the difficulty level of different tasks is subjective
That makes sense. I wonder if difficulty of different scenarios could be derived by assuming a partial ordering and ranking based on training rate: e.g. it preforms better at scenario T if it trains scenario A first, but training scenario first B doesn't help with T. Then infer A < T, and B ? T.