top | item 45684278

(no title)

abstractbg | 4 months ago

The analysis happens on the AI server.

Sans proper profiling, I would guess that the CPU going wild during analysis is due to a combination of 1. analysis is streamed live to the client in 20 simulation intervals 2. some post-processing on the client side 3. the fact that I am using a global context and reducer in React which causes the entire page to re-render each time an update happens.

The networks are simple Resnets with a value and policy head. It's 20 layers with 128 channels per layer. I trained for several days on 2x 4090s. However, recently I trained a few networks (Hex 14x14, Amazons 10x10, Breakthrough 8x8) on a GH200 and it was 2x faster, roughly 100 ckpts per 24 hours for Hex 14x14. I'm not sure about the number of parameters but the .pt and .ts files are on the order of 30-90 MB. There's definitely room for improvement using tricks like quantization during selfplay inference.

I'm very happy you like Tumbleweed! If you're curious there's a Tumbleweed community run by Michał (the creator) https://discord.com/invite/wu6Xdtt497 They are currently playing through their 2025 World Championship.

discuss

tasuki|4 months ago

Thank you so much for the detailed answer!