top | item 40584614

(no title)

I'm not a machine learning person - so I'm confused about this.

As someone who doesnt understand ML - I have always assumed the whole point of ML is to try different things in the game, almost randomly, and over (long) periods of time the AI gets better and better at the game.

If having a single unexpected event causes such a large swing in outcome, and the AI cant "explain" what is different to cause the swing, then what exactly is the ML doing for it to fail on such a seemingly simple change? Doesnt that defeat the whole purpose of this?

I'm obviously missing something obvious - because I would assume the real goal of ML is that it can teach itself the game, even if that involves unexpected situations, as a human does?

discuss

schattschneider|1 year ago

This article doesn't describe it in detail. One scenario imaginable would be that they ran their model trained on non-full moon data for an evaluation on a full moon day. Which means the model would simply apply it's learned "optimal" action policy in a different environment, where the previously learned action policy doesn't lead to good scores anymore.

laurencei|1 year ago

So does this mean if they allowed the game to run on "full moon days", it would be expected to eventually get a higher score (if the full moon day allowed that through the actual game mechanism)?

sigmoid10|1 year ago

It's not a single event, it's more like a new general game state that was never seen during training. Imagine learning to play the violin really well and then someone changes the way acoustics permeate. It doesn't matter if you're a human or an ML algorithm, you're going to have a hard time playing like before.

queuebert|1 year ago

But something is wrong in the learning, because as a human NetHack player who has ascended, I can say that we don't play radically different on full moons. Yes, the random numbers go your way slightly more, but that's about it.

This tells me the algo is trying to hard to predict the game or learn a decent static strategy, rather than make situational decisions.

stetrain|1 year ago

Humans train simultaneously as they operate, and humans can see the message about the full moon.

If nobody includes the full moon message as input to the ML model, and tries to operate the ML model with the training it has achieved running in non-full-moon mode, its operating score in full-moon-mode may be lower.

Even if it had proportional training time against full-moon-mode to incorporate that into the model, if you don't tell it when full-moon-mode is active wouldn't the optimal behavior be to optimize the score for 27/28 days vs 1/28 days of the month?

If full-moon-mode is an input to the model, then it can trained to optimize for both scenarios.

shagie|1 year ago

https://nethackwiki.com/wiki/Time

I predict the next "annoying non-bug" will be Friday, June 13th of 2025.

laurencei|1 year ago

So for ML to work, it has to know all permutations of a game? Does that mean ML is useless for non-deterministic games with random outcomes or procedural generation?