The problem with embeddings is that they're basically inscrutable to anything but the model itself. It's true that they must encode the semantic meaning of the input sequence, but the learning process compresses it to the point that only the model's learned decoder head knows what to do with it. Anthropic's developed interpretable internal features for Sonnet 3 [1], but from what I understand that requires somewhat expensive parallel training of a network whose sole purpose is attempt to disentangle LLM hidden layer activations.[1] https://transformer-circuits.pub/2024/scaling-monosemanticit...
spmurrayzzz|7 months ago
I've come up with so many failed analogies at this point, I lost count (the concept of fast and slow clocks to represent the positional index / angular rotation has been the closest I've come so far).
krackers|7 months ago
gbacon|7 months ago
https://ieeexplore.ieee.org/document/10500152
https://ieeexplore.ieee.org/document/10971523
kianN|7 months ago
LLM embedding are so abstract and far removed from a human interpretable or statistical corollary that even as the embeddings contain more information, that information becomes less accessible to humans.
[1] https://papers.nips.cc/paper_files/paper/2014/hash/b78666971...
gavmor|7 months ago
That's a really interesting three-word noun-phrase. Is it a term-of-art, or a personal analogy?
TZubiri|7 months ago
freeone3000|7 months ago
samrus|7 months ago
visarga|7 months ago
hangonhn|7 months ago