top | item 44157436

(no title)

t55 | 9 months ago

> I personally think that Gemini 2.5 Pro's superiority comes from having hundreds or thousands RL tasks (without any proof whatsoever, so rather a feeling).

Given that GDM pioneered RL, that's a reasonable assumption

discuss

flowerthoughts|9 months ago

Assuming with GDM, you mean Google-Deep Mind. They pioneered RL with deep nets as policy function estimator. The deep nets being a result of CNNs and massive improvements in hardware parallelization at the time.

RL was established, at the latest, with Q-learning in 1989: https://en.wikipedia.org/wiki/Q-learning

t55|9 months ago

i didn't say they invented everything; in science you always stand on the shoulders of giants

i still think my original statement is fair