top | item 39339031

(no title)

Yes it is :), but the library is also a synthetic data generation library, so for example you can create the data for DPO fully synthetically, check out the self-rewarding LLMs example:

https://datadreamer.dev/docs/latest/pages/get_started/quick_...

discuss

sillysaurusx|2 years ago

I’m extremely skeptical of this approach. Until proven otherwise, with a model that users actually find useful, I don’t think this can work.

It would be nice. But I’ve seen too many nice ideas completely fall apart in practice to accept this without some justification. Even if there are papers on the topic, and those papers show that the models rank highly according to some eval metrics, the only metric that truly matters is "the user likes the model and it solves their problems."

By the way, on a separate topic, the 90/10 dataset split that you do in all of your examples turns out to be fraught with peril in practice. The issue is that the validation dataset quality turns out to be crucial, and randomly yeeting 10% of your data into the validation dataset without manual review is a recipe for problems.

patelajay285|2 years ago

It's a demo snippet of how to setup the workflow, it's not meant to be a working production example a self-rewarding model or a faithful reproduction of the original paper. Whether self-rewarding LLMs are a good idea or not, it's a valuable and very active area of research in the literature today. This is a library for ML researchers who should actively research and study these avenues along with the pitfalls you're mentioning. But in order for them to do that, building these workflows have to be accessible to them, which is what this library is meant to do. It's not meant for the "hobbyist" ML-community, they should not be using synthetic data today in this way, it would likely lead to subpar results for any practical model or task.