basicly the real "non-linearity" in deep learning have always been the orthogonality, squashing functions make it easy for the neurons to tap into the orthogonality, while most of the activation functions "lie" about their orthogonality by setting the dot product score to "0", and a dot product of 0 between two vectors means they are orthogonal (linear indep)what i did was rely on both the angular information and spatial information between the input x and the weight w to measure how "similar" they are.
the lower bound of the yat-product is 0, and it is achieved only when two vectors are orthogonal and away
No comments yet.