> I personally think that Gemini 2.5 Pro's superiority comes from having hundreds or thousands RL tasks (without any proof whatsoever, so rather a feeling).
Given that GDM pioneered RL, that's a reasonable assumption
Assuming with GDM, you mean Google-Deep Mind. They pioneered RL with deep nets as policy function estimator. The deep nets being a result of CNNs and massive improvements in hardware parallelization at the time.
flowerthoughts|9 months ago
RL was established, at the latest, with Q-learning in 1989: https://en.wikipedia.org/wiki/Q-learning
t55|9 months ago
i still think my original statement is fair