(no title)
seamossfet | 1 year ago
If that is the case, then the "signal" in this case would be the softmax that encodes the dimensions captured by the query / key space. Since the noise ideally is the same in both softmax encodings, subtracting them should "cancel out" the noise.
No comments yet.