top | item 41975635

(no title)

billmalarky | 1 year ago

This post is using regression to build a reward model. The reward model will then be used (in a future post) to build the overall RL system.

Here's the relevant text from the article:

>In this post we’ll discuss how to build a reward model that can predict the upvote count that a specific HN story will get. And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!

discuss

order

jampekka|1 year ago

The title is misleading. The $4.80 is spent for supervised learning to find the best post.

The post is interesting and I'll be sure to check out the next parts too. It's just that people, as evidenced by this thread, clearly misunderstood or were what was done.