(no title)
adroniser | 1 year ago
To be clear, i'm not claiming that you take an LLM and do some RL on it and suddenly it can do particular tasks. I'm saying that if you train it from scratch using RL it will be able to do certain well defined formal tasks.
Idk what you mean about the online learning ability tbh. The paper uses it in the exact way you specify, which is that it uses RL to play montezuma's revenge and gets better on the fly.
Similar to my point about the inference time RL ability of the alphaProof LLM. That's why I emphasized that RL is done at inference time, like each proof you do it uses to make itself better for next time.
I think you are taking LLM to mean GPT style models, and I am taking LLM to mean transformers which output text, and they can be trained to do any variety of things.
HarHarVeryFunny|1 year ago
Narrow skills like playing Chess (DeepBlue), Go, or math proofs are impressive in some sense, but not the same as generality and/or intelligence which are the hallmarks of AGI. Note that AlphaProof, as the same suggests, has more in common with AlphaGo and AlphaFold than a plain transformer. It's a hybrid neuro-symbolic approach where the real power is coming from the search/verification component. Sure, RL can do some impressive things when the right problem presents itself, but it's not a silver bullet to all machine learning problems, and few outside of David Silver think it's going to be the/a way to achieve AGI.
adroniser|1 year ago