top | item 38738003

(no title)

ajtejankar | 2 years ago

The plots show 2 dimensional projection of the 8 dimensional feature vector of each paragraph. So, x and y axis are linear combination of 8 different experts. Ideally, all of this should be in a single plot but there was a lot of overlap between different sub-categories and it was hard to see. So, I thought of separating them by their broad categories. Also, there are 32 layers in the model, each has 8 different experts, and 2 of them picked for each token.

discuss

order

No comments yet.