top | item 40069841

(no title)

yanniszark | 1 year ago

This is fascinating! I thought only Reinforcement Learning was doing things like this but you're saying you can do this via fuzzying? What does this mean exactly? How is it able to learn to advance through all these levels? Is there an underlying learning mechanism at play?

discuss

order

infogulch|1 year ago

It appears that you are not familiar with the concept of fuzzing.

Fuzzing is a moderately advanced software testing technique popularized in the '90s that operates on a very simple idea: If you feed a program's inputs with arbitrary/random data, this could be used to discover bugs in the program with little human effort.

In the 90s they fed random data into the stdin of unix utilities and found that many programs crashed. [0] In this context printing an error message that says "I can't interpret the input" is a valid state, but reading past the end of a buffer because the input confused the program is a bug. Variants can be designed to test any API layer.

More recently Coverage Guided Fuzzers use information about which code paths are executed for each input as a way to reach a variety of program states more quickly. Also, starting with a prefix known to produce an interesting state can also speed up testing.

I wrote a comment relating this to the article and talk in the OP here: https://news.ycombinator.com/item?id=40068187#40071950

vojev|1 year ago

There's no learning exactly, as the post explains the fuzzer is aware of various RAM addresses (as well as having a tactic for how it "presses" buttons in the game). It's just trying to explore the space of Mario's level + his x and y coordinates.

(I'm an Antithesis employee.)

nextaccountic|1 year ago

This means that, without a learning procedure to direct Mario towards the end of the level, it can only reach the end by itself because the levels (and Mario's in-memory data structures in general) are pretty small, right?

Or rather, if there were tons of irrelevant state, it could always end up trapped somewhere and never actually complete a level even after centuries of fuzzing.

Something similar was tested in the Twitch Plays Pokemon [0] gaming experiment, but there the inputs appeared random but weren't actually random: there were "factions" that either tried to sabotage the run, or that tried to make it progress. Ultimately the majority of the players were cooperating to complete the game and this was a deciding factor to make the run succeed. Maybe fuzzing Pokemon can't complete the game, the way that TPP could (or reinforcement learning could).

[0] https://en.wikipedia.org/wiki/Twitch_Plays_Pok%C3%A9mon