Wow! Thanks for taking the time to think through it. Yes, you are exactly right! I couldn’t have described Augento better than this myself. We actually want to make writing a reward function completely optional and build some RLHF (Reinforcement Learning from Human Feedback) loop soon. One of our long-term goals is to bring the cost of RL down so the barrier of entry to fine-tuning big models is not as high as it currently is.
No comments yet.