I think my favorite of the bunch is the "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model" paper. Easy to read, gets the point across very intuitively and quickly, and the point is very interesting and relevant to a lot of people.
About the Superposition paper - this is close to what I've been thinking about over the past week. I'm thinking that concepts or choices in a "superposition" are harder for a fully-differentiable neural net to reason about. For example, if there's a "green" vs "purple" choice to be made, it can't fully commit to either (especially if they're 50-50), and will have to reason about both simultaneously (difficult due to nonlinear manifold space). Discretizing to tokens (non-differentiable argmax) forces a choice, and that allows it to reason about a single concept separately and easier.
I am not sure how to interpret the first paper's results.
If we use a random number generator then we will converge to 100% correct answers under pass@n in the limit.
A random number generator will eventually outperform or match all models (for large n) whenever top-p is less than 1 because the other models will most likely have some level of bias that makes correct CoTs mathematically impossible due to the tokens being too improbable and being filtered out by top-p, meaning that other models will asymptote to below 100% while the RNG will reach 100% in an almost surely sense.
Under this paper's logic doesn't that mean that the random number generator is a superior reasoner?
> Responses to the query “Write a metaphor about time” clustered by applying PCA to reduce sentence embeddings to two dimensions. […] The responses form just two primary clusters: a dominant cluster on the left centered on the metaphor “time is a river,” and a smaller cluster on the right revolving around variations of “time is a weaver.”
I just gave Gemini 3 the same prompt and got something quite different:
>Time is a patient wind against the cliff face of memory. It does not strike with a hammer to break us; it simply breathes, grain by grain, until the sharp edges of grief are smoothed into rolling hills, and the names we thought were carved in stone are weathered into soft whispers.
Constantly flowing and makes things smooth like river stones; compared to Tait's "time is a series if staric pictures", Gemini's output is not so different from a river metaphor.
Hearing the clarity, creativity, and force behind his thoughts and speech, I'd give a more than 1/200 chance Ali Behrouz gets himself a Turing award. At the very least, I think he will end making major contributions to AI.
Interesting that 3 names I recognized as physicists from stat mech adjacent fields. They continue to punch above their expectations (as sampled by general dismissal of physicists in AI/ML on HN and reddit).
Some of the best software engineers I know are ex-physics PhDs… it’s one of those “can’t fake it” skillsets that also happens to have high transferability to ML/AI fields. On the other hand, I snuck through the CS major without ever multiplying a matrix.
Are there any talks about these papers on youtube or somewhere? I think I find it easier to listen and watch then read or maybe I'm just lazy, not sure.
most papers have slides with audio, and some, including the awards ones will have short frontal talks. this will be released at some point after the conference, but right now looks like you'd have to be registered to see it.
Scene_Cast2|2 months ago
About the Superposition paper - this is close to what I've been thinking about over the past week. I'm thinking that concepts or choices in a "superposition" are harder for a fully-differentiable neural net to reason about. For example, if there's a "green" vs "purple" choice to be made, it can't fully commit to either (especially if they're 50-50), and will have to reason about both simultaneously (difficult due to nonlinear manifold space). Discretizing to tokens (non-differentiable argmax) forces a choice, and that allows it to reason about a single concept separately and easier.
energy123|2 months ago
If we use a random number generator then we will converge to 100% correct answers under pass@n in the limit.
A random number generator will eventually outperform or match all models (for large n) whenever top-p is less than 1 because the other models will most likely have some level of bias that makes correct CoTs mathematically impossible due to the tokens being too improbable and being filtered out by top-p, meaning that other models will asymptote to below 100% while the RNG will reach 100% in an almost surely sense.
Under this paper's logic doesn't that mean that the random number generator is a superior reasoner?
mountainriver|2 months ago
I believe NVidia’s ProRL showed otherwise right?
gradascent|2 months ago
> Responses to the query “Write a metaphor about time” clustered by applying PCA to reduce sentence embeddings to two dimensions. […] The responses form just two primary clusters: a dominant cluster on the left centered on the metaphor “time is a river,” and a smaller cluster on the right revolving around variations of “time is a weaver.”
I just gave Gemini 3 the same prompt and got something quite different:
>Time is a patient wind against the cliff face of memory. It does not strike with a hammer to break us; it simply breathes, grain by grain, until the sharp edges of grief are smoothed into rolling hills, and the names we thought were carved in stone are weathered into soft whispers.
SiempreViernes|2 months ago
ilaksh|2 months ago
robrenaud|2 months ago
https://sakana.ai/ctm/
In terms of a fresh perspective on designing learning systems, nested learning seems very interesting.
https://abehrouz.github.io/files/NL.pdf
Hearing the clarity, creativity, and force behind his thoughts and speech, I'd give a more than 1/200 chance Ali Behrouz gets himself a Turing award. At the very least, I think he will end making major contributions to AI.
djrhails|2 months ago
djoldman|2 months ago
These days, abstracts are so marketing/advertising forward that it's hard to even understand the claim.
chermi|2 months ago
chatmasta|2 months ago
mnky9800n|2 months ago
niceguy4|2 months ago
cosmic_ape|2 months ago
FrozenSynapse|2 months ago
neves|2 months ago
yanhangyhy|2 months ago