top | item 43398931

(no title)

Hi. So quickly:

* RL is Reinforcement Learning. Already used for a while as part of RLHF but now we have started to find a very nice combo of reasoning+RL on verifiable tasks. Core idea is that models are not just good a predicting the next token but the next right answer.

* I think anything infra with already some ML bundled is especially up for grabs but this will have a more transformative impact than your usual SaaS. Network engineering is a good example: highly formalized but also highly complex. RL models could increasingly nail that.

discuss

dcow|11 months ago

Respectfully, when you’re responding to someone who doesn't know what RL is, and you say “it’s this—already used in [another even lesser known acronym that includes the original]…” it doesn’t really help asker (like if you know what RLHF is then you know what RL is). I’ll admit I knew what RL was already but I don’t know what RLHF is and the comment just confuses me.

What is RLHF?

diggan|11 months ago

Am I the only one who uses a search engine while reading comment threads about industries/technologies I am not familiar with? This whole conversation is like two searches away from explaining everything (or a two minute conversation with an LLM I suppose)

furyofantares|11 months ago

This sounds impossible but I would guess RLHF is actually a better known acronym than RL. It became fairly popularly known among tech folks with no AI experience when ChatGPT came out.

npodbielski|11 months ago

Thanks. And what about some more user focused tasks? I.e. I have small but fairly profitable company that writes specialized software for accountants. Usually it is pretty complex, tax law tends to be changed very often, there are myriads of rules, exemptions etc. Could this be solved with ML? How long till we get there it at all? How costly this would be? Disclaimer: I do not write such software. This is just an example.