(no title)
shoelessone | 1 year ago
Is it possible to explain what this means in a way that somebody only roughly familiar with vectors and vector databases? Or recommend an article or further reading on the topic?
shoelessone | 1 year ago
Is it possible to explain what this means in a way that somebody only roughly familiar with vectors and vector databases? Or recommend an article or further reading on the topic?
causal|1 year ago
Essentially each token of a text occupies a point in a many-dimensional model that represents meaning, and LLMs predict the next token by modifying the last token with the context of all the tokens before it. Attention heads are basically a way of choosing which prior tokens are most relevant and adjusting the last token's point in vector-space accordingly.