top | item 38394413

Open-source AI Feedback framework for scalable LLM Alignment

13 points| dvilasuero | 2 years ago |github.com

2 comments

Hey there!

We've just released a new open-source project for AI feedback to build datasets for RLHF-related methods (like DPO).

Recent projects like Zephyr and Tulu from AllenAI have shown it's possible to build powerful open-source models with DPO and AI Feedback (AIF) datasets.

There's a lot of exciting research in the AIF spaces, such as UltraFeedback (the dataset leveraged by Zephyr and Tulu), JudgeLM, or Prometheus.

However, going beyond research efforts and applying AIF at scale it's different. For enterprise and production use, we need framework that implements key AIF methods on a robust, efficient and scalable way. This framework should enable AI engineers to build custom datasets and scale for their own use cases.

This, combined with humans-in-the-loop for improving dataset quality is the next big leap for OSS LLM models.

distilabel aims to bridge this gap.

We'd love your feedback!

laguitte|2 years ago

It's really interesting to see open-source tools like Argilla pushing the field to let open-source models get trained the way OpenAI's models are.