top | item 42991131

(no title)

FieryTransition | 1 year ago

Thanks a lot for the detailed reply, it was better than I had hoped for :)

So knowledge transfer is something incredibly specific and much more narrow than what I thought. They don't transfer concepts by generalization, but they compress knowledge instead, which I assume the difference is, that generalization is much more fluid, while compression is much more static, like a dictionary where each key has a probability to be chosen, and all the relationships are frozen, and the only generalization that happens, is the generalization which is an expression of the training method used, since the training method freezes it's "model of the world" into the weights so to say? So if the training method itself cannot generalize, but only compress, why would the resulting model that the training method produces? Is that understood correctly?

Does there exist a computational model, which can be used to analyse a training method and put a bound on the expressiveness of the resulting model?

It's fascinating that the emergent ability of models disappear if you measure them differently. Guess the difference is that "emergent abilities" are kinda nonsensical, since they have no explanation of causality (i.e. it "just" happens), and just seeing the model getting linearly better with training fits into a much more sane framework. That is, like you said, when your success metric is measuring discretely, you also see the model itself as discrete, and it hides the continuous hill climbing you would otherwise see the model exhibit with a different non-discrete metric.

But the model still gets better over time, so would you expect the model to get progressively worse on a more generalized metric, or does it only relate to the spikes in the graph that they talk about? IE, they answer the question of "why" jumps in performance are not emergent, but they don't answer why the performance keeps increasing, even if it is linear, and whether it is detrimental to other less related tasks?

And if you wanted to test "emergent" wouldn't it be more interesting to test the model on tasks, which would be much more unrelated to the task at hand? That would be to test generalization, more so as we see humans see it? So it wouldn't really be emergence, but generalization of concepts?

It makes sense that it is more straightforward to refute a claim by using contradiction. Would it be good practice for papers, to try and refute their own claims by contradiction first? I guess that would save a lot of time.

It's interesting about the knowledge leakage, because I was thinking about the concept of world simulations and using models to learn about scenarios through simulations and consequence. But the act of creating a model to perceive the world, taints the model itself with bias, so the difficulty lies in creating a model which can rearrange itself to get rid of incorrect assumptions, while disconnecting its initial inherent bias. I thought about models which can create other models etc, but then how does the model itself measure success? If everything is changing, then so is the metric, so the model could decide to change what it measures as well. I thought about hard coding a metric into the model, but what if the metric I choose is bad, and we are then stuck with the same problem of bias as well. So it seems like there are only two options, it either converges towards total uncontrollability or it is inherently biased, there's doesn't seem to be any in-between?

I admit I'm trying to learn things about ML I just find general intelligence research fascinating (neuroscience as well), but the more I learn, the more I realize I should really go back to the fundamentals and build up. Because even things which seem like they make sense on a surface level, really has a lot of meaning behind them, and needs a well-built intuition not from a practical level, but from a theoretical level.

From the papers I've read which I find interesting, it's like there's always the right combination of creativity in thinking, which sometimes my intuition/curiosity about things proved right, but I lack the deeper understanding, which can lead to false confidence in results.

discuss

order

godelski|1 year ago

Well fuck... My comment was too long... and it doesn't get cached -___-

I'll come back and retype some of what I said but I need to do some other stuff right now. So I'll say that you're asking really good questions and I think you're mostly understanding things.

So give you very quick answers:

Yes, things are frozen. There's active/online learning but even that will not solve all the issues at hand.

Yes, we can put bounds. Causal models naturally do this but statistics is all about this too. Randomness is a measurement of uncertainty. Note that causal models are essentially perfect embeddings. Because if you've captured all causal relationships, you gain no more value from additional information, right?

Also note that we have to be very careful about assumptions. It is always important to uncover what assumptions have been made and what the implications are. This is useful in general problems solving and applies to anything in your life, not just AI/ML/coding. Unfortunately, assumptions are almost never explicitly stated, so you got to go hunting.

See how physics defines strong emergence and weak emergence. There are no known strongly emerging phenomena and we generally believe they do not exist. For weakly emerging, well it's rather naive to discuss this in the context of ML if we're dedicating so little time and effort to interpretation, right? That's kinda the point I was making previously about not being able to differentiate an emergent phenomena from not knowing we gave it information.

For the "getting better" it is about the spikes. See the first two figures and their captions in the response paper.

More parameters do help btw, but make sure you distinguish the difference between a problem being easier to solve and a problem not being solvable. The latter is rather hard to show. But the paper is providing strong evidence to the underlying issues being about the ease of problem solving rather than incapacity.

Proof is hard. There's nothing wrong with being empirical, but we need to understand that this is a crutch. It is evidence, not proof. We leaned on this because we needed to start somewhere. But as progress is made so too must all the metrics and evaluations. It gets exponentially harder to evaluate as progress is made.

I do not think it is best to put everyone in ML into the theory first and act like physicists. Rather we recognize the noise and do not lock out others from researching other ideas. The review process has been contaminated and we lost sight. I'd say that the problem is that we look at papers as if we are looking at products. But in reality, papers need to be designed with understanding the experimental framework. What question is being addressed, are variables being properly isolated, and do the results make a strong case for the conclusion? If we're benchmark chasing we aren't doing this and we're providing massive advantage to "gpu rich" as they can hyper-parameter tune their way to success. We're missing a lot of understanding because of this. You don't need state of the art to prove a hypothesis. Nor to make improvements on architectures or in our knowledge. Benchmarks are very lazy.

For information leakage, you can never remove the artist from the art, right? They always leave part of themselves. That's okay, but we must be aware of the fact so we can properly evaluate.

Take the passion, and dive deep. Don't worry about what others are doing, and pursue your interests. That won't make you successful in academia, but it is the necessary mindset of a researcher. Truth is no one knows where we're going and which rabbit holes are dead ends (or which look like dead ends but aren't). It is good to revisit because you table questions when learning, but then we forget to come back to them.

  > needs a well-built intuition not from a practical level, but from a theoretical level.
The magic is at the intersection. You need both and you cannot rely on only one. This is a downfall in the current ML framework and many things are black boxes only because no one has bothered to look.