top | item 35161858

(no title)

It could be done in a dozen ways. One beautiful method is just using the xPos positional embedding pioneered by Microsoft and scale the context window size at runtime (even better if your attention is subquadratic - again there is a dozen of varieties to pick from), see:

"A Length-Extrapolatable Transformer"

https://arxiv.org/abs/2212.10554

"Language Is Not All You Need: Aligning Perception with Language Models"

https://arxiv.org/abs/2302.14045

Notably, this positional embedding has been implemented by lucidrains in his x-transformers package: https://github.com/lucidrains/x-transformers/blob/main/x_tra...

discuss

No comments yet.