fdalvi's comments

fdalvi | 7 months ago | on: GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

It is indeed not something clarified by the code snippets; In normal feedforward layers, it is common to choose the "hidden_dim = 4 x emb_dim", while in GLU feedforward layer, the convention is to use "hidden_dim = 2/3 * regular_ffn_hidden_dim" (to keep the overall number of parameters roughly the same). In the case of gpt-oss, they chose to go a bit more extreme and set "hidden_dim = emb_dim", thus reducing the overall number of parameters!

fdalvi | 5 years ago | on: Show HN: Quake 1 movement physics reinforcement learning project

It wouldn't be difficult at all if the optimal running technique was known before hand; I think the goal of many of these RL exercises is to either i) find a better solution than what we may have imagined or ii) confirm that our knowledge was indeed the best possible solution!
page 1