(no title)
RandyOrion | 2 months ago
> We analyze over 1,100 deep neural networks—including 500 Mistral-7B LoRAs and 500 Vision Transformers. We provide the first large-scale empirical evidence that networks systematically converge to shared, low-dimensional spectral subspaces, regardless of initialization, task, or domain.
I instantly thought of muon optimizer which provides high-rank gradient updates and Kimi-k2 which is trained using muon, and see no related references.
The 'universal' in the title is not that universal.
No comments yet.