(no title)
julius | 7 months ago
"The dot product itself is a geometrically impoverished measure, primarily capturing alignment while conflating magnitude with direction and often obscuring more complex structural and spatial relationships [10, 11, 4, 61, 17]. Furthermore, the way current activation functions achieve non-linearity can exacerbate this issue. For instance, ReLU (f (x) = max(0, x)) maps all negative pre-activations, which can signify a spectrum of relationships from weak dissimilarity to strong anti-alignment, to a single zero output. This thresholding, while promoting sparsity, means the network treats diverse inputs as uniformly orthogonal or linearly independent for onward signal propagation. Such a coarse-graining of geometric relationships leads to a tangible loss of information regarding the degree and nature of anti-alignment or other neg- ative linear dependencies. This information loss, coupled with the inherent limitations of the dot product, highlights a fundamental challenge."
mlnomadpy|6 months ago