top | item 40655824

(no title)

dimask | 1 year ago

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems. During my mathematics education I had to practice solving a lot of problems dissimilar what I had seen before. Even in the theory part, a lot of it was actually about filling in details in proofs and arguments, and reformulating challenging steps (by words or drawings). My notes on top of a mathematical textbook are much more than the text itself.

People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions. The original article is spot on that there is no AGI pathway in the current research direction. But there are huge incentives for ignoring this.

discuss

naasking|1 year ago

> Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems.

I think it's more accurate to say that they learn math by memorizing a sequence of steps that result in a correct solution, typically by following along with some examples. Hopefully they also remember why each step contributes to the answer as this aids recall and generalization.

The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly. This is just standard training. Understanding the motivation of each step helps with that memorization, and also allows you to apply that step in novel problems.

> The original article is spot on that there is no AGI pathway in the current research direction.

I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits for problem solving if trained enough, and parametric memory generalizes their operation to many more tasks.

They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Composition tasks are still challenging, but parametric memory is a big step in the right direction for that too. Accurate comparitive and compositional reasoning sound tantalizingly close to AGI.

Vetch|1 year ago

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.

While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).

> I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits

It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

From: https://arxiv.org/abs/2405.15071

> The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

shkkmo|1 year ago

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Perhaps that is how you learned math, but it is nothing like how I learned math. Memorizing steps does not help, I sucked at it. What works for me us understanding the steps and why we used them. Once I understood the process and why it worked, I was able to reason my way through it.

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly.

Did you look at the types of problems presented by the ARC-AGO test? I don't see how memorization plays any role.

> They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Then lets see how they do on the ARC test? While it is possible that generalized circuits can develop in Ls with enough training but I am pretty skeptical till we see results.

imtringued|1 year ago

Every time I see people online reduce the human thinking process to just production of a perceptible output, I start questioning myself, whether somehow I am the only human on this planet capable of thinking and everyone else is just pretending. That can't be right. It doesn't add up.

The answer is that both humans and the model are capable of reasoning, but the model is more restricted in the reasoning that it can perform since it must conform to the dataset. This means the model is not allowed to invest tokens that do not immediately represent an answer but have to be derived on the way to the answer. Since these thinking tokens are not part of the dataset, the reasoning that the LLM can perform is constrained to the parts of the model that are not subject to the straight jacket of training loss. Therefore most of the reasoning occurs in-between the first and last layers and ends with the last layer, at which point the produced token must cross the training loss barrier. Tokens that invest into the future but are not in the dataset get rejected and thereby limit the ability of the LLM to reason.

TeMPOraL|1 year ago

> People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions

And almost all of it is just more text, or described in more text.

You're very much right about this. And that's exactly why LLMs work as well as they do - they're trained on enough text of all kinds and topics, that they get to pick up on all kinds of patterns and relationships, big and small. The meaning of any word isn't embedded in the letters that make it, but in what other words and experiences are associated with it - and it so happens that it's exactly what language models are mapping.

dimask|1 year ago

It is not "just more text". That is an extremely reductive approach on human cognition and experience that does favour to nothing. Describing things in text collapses too many dimensions. Human cognition is multimodal. Humans are not computational machines, we are attuned and in constant allostatic relationship with the changing world around us.

whyever|1 year ago

I think there is a component of memorizing solutions. For example, for mathematical proofs there is a set of standard "tricks" that you should have memorized.

shkkmo|1 year ago

Sure memory helps a lot, it allows you to concentrate your mental effort on the novel ot unique parts of the problem.