top | item 44608055

Gradient Descent on Token Input Embeddings

3 points| kp1197 | 7 months ago |lesswrong.com

1 comment

order

kp1197|7 months ago

Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?