top | item 37825365

(no title)

cubie | 2 years ago

By "irrespective of their relevance to the language modeling task", the authors mean that the semantic meaning of the tokens is not important. These 4 tokens can be completely replaced by newlines (i.e. tokens with no semantic meaning), and the perplexity as measured on a book of 65k tokens is nearly unaffected.

The clue is really that these tokens are just used to "offload" attention scores - their semantic meaning is irrelevant.

discuss

order

No comments yet.