(no title)
machinelearning | 1 year ago
It has to be done in a hierarchical way to know what you attended to + full context.
If the differential vector is being computed with the same input as the attention vector how do you know how to modify the attention vector correctly
quantadev|1 year ago