top | item 39339423

(no title)

It's a demo snippet of how to setup the workflow, it's not meant to be a working production example a self-rewarding model or a faithful reproduction of the original paper. Whether self-rewarding LLMs are a good idea or not, it's a valuable and very active area of research in the literature today. This is a library for ML researchers who should actively research and study these avenues along with the pitfalls you're mentioning. But in order for them to do that, building these workflows have to be accessible to them, which is what this library is meant to do. It's not meant for the "hobbyist" ML-community, they should not be using synthetic data today in this way, it would likely lead to subpar results for any practical model or task.

discuss

sillysaurusx|2 years ago

There’s a lot to unpack here.

First, I’m an ML researcher. I don’t go around saying so because appeal to authority is bogus, but since every one of your comments seems to do this, it’s unavoidable.

You say the code is for ML researchers, then flat out say that it’s not a working production example, nor is it a faithful reproduction of a paper. So what is it?

Whether you want it to be or not, your audience is the hobbyist ML community, because without benchmarks to back up your code examples, no one from the research community will trust your examples without actual proof that they work. That’s the hard part of research, and it’s most of the effort.

My advice is, write something that can train useful models. Implement a production grade workflow, and show some reasons why it works. If you’re trying to get the wider ML research community to buy in to this, there’s not much other way to do it. No one will want to take easy code that does the wrong thing, and most of your examples show the wrong thing to do, like the 90/10 split.

You’re also a bit defensive about accepting feedback. Trust me that it’s better to accept that your code sucks and does the wrong thing, and then try to make it suck less and do the right thing. That’s how the majority of good software is written, unless you’re cperciva. But he’d also publish a paper explaining why his code is correct.

Anyway, the whole point of posting this to HN is to get feedback on it. (If you were hoping that a bunch of people would suddenly use it, then you need to appeal to the hobbyist community. They’ve told you a bunch of things that you’ve straight up said is out of scope.) And it sounds like you were hoping for feedback from ML researchers. Maybe others will chime in, but for now, that’s the best I’ve got.

patelajay285|2 years ago

I think you're interpreting hostility where there is none, so I don't have much to say other than it's an infrastructure library, a demonstration snippet doesn't need to show how to train a production grade model. I appreciate the feedback and it's noted.