top | item 44711582

(no title)

mbowcut2 | 7 months ago

The problem with embeddings is that they're basically inscrutable to anything but the model itself. It's true that they must encode the semantic meaning of the input sequence, but the learning process compresses it to the point that only the model's learned decoder head knows what to do with it. Anthropic's developed interpretable internal features for Sonnet 3 [1], but from what I understand that requires somewhat expensive parallel training of a network whose sole purpose is attempt to disentangle LLM hidden layer activations.

[1] https://transformer-circuits.pub/2024/scaling-monosemanticit...

discuss

spmurrayzzz|7 months ago

Very much agree re: inscrutability. It gets even more complicated when you add the LLM-specific concept of rotary positional embeddings to the mix. In my experience, it's been exceptionally hard to communicate that concept to even technical folks that may understand (at a high level) the concept of semantic similarity via something like cosine distance.

I've come up with so many failed analogies at this point, I lost count (the concept of fast and slow clocks to represent the positional index / angular rotation has been the closest I've come so far).

krackers|7 months ago

I've read that "No Position Embedding" seems to be better for long-term context anyway, so it's probably not something essential to explain.

gbacon|7 months ago

I found decent results using multiclass spectral clustering to query embedding spaces.

https://ieeexplore.ieee.org/document/10500152

https://ieeexplore.ieee.org/document/10971523

kianN|7 months ago

This is exactly the challenge. When embedding were first popularized in word to vec they were interpretable because the word2vec model was revealed to be a batched matrix factorization [1].

LLM embedding are so abstract and far removed from a human interpretable or statistical corollary that even as the embeddings contain more information, that information becomes less accessible to humans.

[1] https://papers.nips.cc/paper_files/paper/2014/hash/b78666971...

gavmor|7 months ago

> learned decoder head

That's a really interesting three-word noun-phrase. Is it a term-of-art, or a personal analogy?

TZubiri|7 months ago

Can't you decode the embeddings to tokens for debugging?

freeone3000|7 months ago

You can but this is lossy (as it drops context; it’s a dimensionality reduction from 512 or 1024 to a few bytes) and non-reconvertible.

samrus|7 months ago

I mean thats true for all DL layers, but we talk about convolutions and stuff often enough. Embedding are relatively new but theres not alot of discussion as to how crazy they are, especially given that they are the real star of the LLM, with transformers being a close second imo

visarga|7 months ago

You can search the closest matching words or expressions in a dictionary. It is trivial to understand where an embedding points to.

hangonhn|7 months ago

Can you do that in the middle of the layers? And if you do, would that word be that meaningful to the final output? Genuinely curious.