top | item 43539347

(no title)

Zollerboy1 | 11 months ago

Wow! Thanks for taking the time to think through it. Yes, you are exactly right! I couldn’t have described Augento better than this myself. We actually want to make writing a reward function completely optional and build some RLHF (Reinforcement Learning from Human Feedback) loop soon. One of our long-term goals is to bring the cost of RL down so the barrier of entry to fine-tuning big models is not as high as it currently is.

discuss

No comments yet.