(no title)
sojuz151 | 1 year ago
We are dealing with multi-headed attention, therefore we have multiple points per token. You can always increase the number of heads or the size of the key vector.
sojuz151 | 1 year ago
We are dealing with multi-headed attention, therefore we have multiple points per token. You can always increase the number of heads or the size of the key vector.
causal|1 year ago