top | item 40873636

(no title)

donadigo | 1 year ago

This is an awesome overview and if you want more, most of those are documented in an approachable way on YouTube.

Just wanted to provide some perspective here on how many things those projects need to take care of in order to get some training setup going.

I'm the developer behind TMInterface [1] mentioned in this post, which is a TAS tool for the older TrackMania game (Nations Forever). For Linesight (last project in this post), I recently ended up working with its developers to provide them the APIs they need to access from the game. There's a lot of things RL projects usually want to do: speed up the game (one of the most important), deterministically control the vehicle, get simulation information, navigate menus, skip cut scenes, make save states, capture screenshots etc. Having each of those things implemented natively greatly impacts the stability and performance of training/inference in a RL agent, e.g. for the latest version the project uses a direct capture of the surface that's rendered to the game window, instead of using an external Python library (DxCam). This is faster, doesn't require any additional setup and also allows for training even if the game window is completely occluded by other windows.

There are also many other smaller annoying things: many games throttle FPS if the window is unfocused which is also the case here, and the tool patches out this behaviour for the project, and there's a lot more things like this. The newest release of Linesight V3 [2] can reliably approach world records and it's being trained & experimented with by quite a few people. The developers made it easy to setup and documented a lot of the process [3].

[1] https://donadigo.com/tminterface/

[2] https://youtu.be/cUojVsCJ51I

[3] https://linesight-rl.github.io/linesight/build/html/

discuss

order

Daneel_|1 year ago

I know your name from falling asleep to Wirtual videos. I think I actually found his content thanks to your collaboration on the cheating scandal. Thanks for all your hard work - it's obvious how significant and beneficial it is within the TM community.

brutus1213|1 year ago

Scientist here and have book marked this article for close reading (so apologies for this question if it is discussed in the article).

I had a few brushes in RL (with collaborators who knew more RL than I did). A key issue we encountered in different problem settings was the number of samples required to train. We created a headless version of the underlying environment but could not make it go a lot faster than real-time. We also did some work to parallelize but it wasn't enough (and it was expensive). Is the TM related RL training happening in real-time or is it possible to speed it up? That seemed like the key problem to make RL widely used, but curious about your thoughts.

donadigo|1 year ago

I'm not sure about your particular case, but if your environment really is headless, then it should absolutely be possible to run it a lot faster than realtime. It depends on what the environment is and if you have access to its source code (we do not have that in TrackMania so it's a lot harder). Either the environment is purposely throttling the amount of time it simulates, or it just takes so much time to simulate the environment that it's not possible to speed it up anymore.

We're lucky in case of TrackMania because it internally has systems to both set the relative game speed and also completely disable all rendering and just run physics. Linesight achieves about ~10x speedup where the most time spent now is in rendering game frames and running the inference on the network. They also parallelize training by running more game instances and implementing a training queue. For the "raw" speedup ratios, TM usually achieves about ~60x (one minute is simulated in one second) and I use this speedup to implement bruteforce functionality in the tool (coupled with a custom save states implementation).

msephton|1 year ago

It's possible to speed it up by running the game as fast as it can go (so, not limited as it normally is for human consumption). They talk about running it at 9x speed, so months of training could be done in 80 hours.