top | item 39855510

(no title)

From the paper:

> Finally, we use our dataset and LRE-estimating method to build a visualization tool we call an attribute lens. Instead of showing the next token distribution like Logit Lens (nostalgebraist, 2020) the attribute lens shows the object-token distribution at each layer for a given relation. This lets us visualize where and when the LM finishes retrieving knowledge about a specific relation, and can reveal the presence of knowledge about attributes even when that knowledge does not reach the output.

They're just looking at what lights up in the embedding when they feed something in, and whatever lights up is "knowing" about that topic. The function is an approximation they added on top of the model. It's important to not conflate this with the actual weights of the model.

You can't separate the hallucinations from the model -- they exist precisely because of the lossy compression.

discuss

unknown|1 year ago

[deleted]