top | item 40070185

(no title)

vojev | 1 year ago

There's no learning exactly, as the post explains the fuzzer is aware of various RAM addresses (as well as having a tactic for how it "presses" buttons in the game). It's just trying to explore the space of Mario's level + his x and y coordinates.

(I'm an Antithesis employee.)

discuss

nextaccountic|1 year ago

This means that, without a learning procedure to direct Mario towards the end of the level, it can only reach the end by itself because the levels (and Mario's in-memory data structures in general) are pretty small, right?

Or rather, if there were tons of irrelevant state, it could always end up trapped somewhere and never actually complete a level even after centuries of fuzzing.

Something similar was tested in the Twitch Plays Pokemon [0] gaming experiment, but there the inputs appeared random but weren't actually random: there were "factions" that either tried to sabotage the run, or that tried to make it progress. Ultimately the majority of the players were cooperating to complete the game and this was a deciding factor to make the run succeed. Maybe fuzzing Pokemon can't complete the game, the way that TPP could (or reinforcement learning could).

[0] https://en.wikipedia.org/wiki/Twitch_Plays_Pok%C3%A9mon

vojev|1 year ago

The space is large, it just turns out if you direct Mario to explore with a bit of bias (so, in general, there's some favoring of exploring from states where Mario's x coordinate is to the right, e.g.) it completes the levels.

I think Pokemon could be beaten with our techniques. Final Fantasy on NES poses similar problems to Pokemon, and that is a game at which some progress has been made in the past, here.