top | item 45609116

(no title)

I wrote elsewhere but I’m more interpreting this distinction as “RL in real-time” vs “RL beforehand”.

discuss

This is referred to as “online reinforcement learning” and is already something done by, for example Cursor for their tab prediction model.

https://cursor.com/blog/tab-rl

tinodb|4 months ago

Not sure that’s the same. They just very frequently retrain and “deploy a new model”.

munchler|4 months ago

I agree with this description, but I'm not sure we really want our AI agents evolving in real time as they gain experience. Having a static model that is thoroughly tested before deployment seems much safer.

mbesto|4 months ago

> Having a static model that is thoroughly tested before deployment seems much safer.

While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.