(no title)
_hl_ | 1 year ago
A transformer-based embedding model doesn’t just give you a vector for the entire input string, it gives you vectors for each token. These are then “pooled” together (eg averaged, or max-pooled, or other strategies) to reduce these many vectors down into a single vector.
Late chunking means changing this reduction to yield many vectors instead of just one.
No comments yet.