top | item 42379237

(no title)

bick_nyers | 1 year ago

I wonder if you would want to use an earlier layer as opposed to the penultimate layer, I would imagine that the LLM uses that layer to "prepare" for the final dimensionality reduction to clean the signal such that it scores well on the loss function.

discuss

order

No comments yet.