Show HN: Aion-Torch – Adaptive residual scaling for deep Transformers
2 points| Rioverde | 3 months ago |github.com
The repo has a drop-in AionResidual module, some basic tooling to log what’s happening inside the network, and small examples to show how to plug it into existing models. I’d love feedback on whether this idea makes sense beyond toy setups, how you would benchmark it against standard residuals/DeepNorm on real tasks, and if the API feels natural for people who train larger models.
No comments yet.