(no title)
Dorialexander | 11 months ago
* RL is Reinforcement Learning. Already used for a while as part of RLHF but now we have started to find a very nice combo of reasoning+RL on verifiable tasks. Core idea is that models are not just good a predicting the next token but the next right answer.
* I think anything infra with already some ML bundled is especially up for grabs but this will have a more transformative impact than your usual SaaS. Network engineering is a good example: highly formalized but also highly complex. RL models could increasingly nail that.
dcow|11 months ago
What is RLHF?
diggan|11 months ago
furyofantares|11 months ago
npodbielski|11 months ago