top | item 11815237

(no title)

lionleaf | 9 years ago

Note that he in this pong example in particular every frame he gives the network and option to either go up or down, but no option for standing still. So it _has_ to be jittery.

But as the other people have commented, adding a small penalty on every move and giving it the option to stand still (along with some normalization?) might give a much smoother result.

discuss

tripzilch|9 years ago

I went back to that part of the article several times, because like you, I found it a bit odd that there was no option to stand still.

The first part of the article definitely seems to give the impression that the choice is between UP or DOWN every frame.

But then a bit further on it was a bit more ambiguous and I could also interpret it as giving a probability for UP and a separate one for DOWN. Then it could also choose neither. But then it also could choose both, and you need a conflict resolution procedure (do neither, pick the one with highest probability, maybe just roll again?). Unless the actual game also has two buttons and you can just do whatever the game engine will do if you press both.

Another possibility might be to model the output a bit more like a human player would do it. First I'd change it into a series of timings + note-on/note-off commands (like MIDI), then perhaps add jitter to the timings (making sure the note-off doesn't jitter before the corresponding note-on). I've read that adding this kind of noise to a NN tends to improve its robustness, so that might help?

Most of those changes would happen as transformation step between the output layer and the simulation input, so I presume the learning algo itself can mostly stay the same. But there's probably a few snags to that as well.