top | item 37249934 (no title) snippyhollow | 2 years ago We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences. discuss order hn newest malwrar|2 years ago Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming) ttul|2 years ago Undoubtedly, they have tried ALiBi… unknown|2 years ago [deleted]
malwrar|2 years ago Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming) ttul|2 years ago Undoubtedly, they have tried ALiBi…
malwrar|2 years ago
[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming)
ttul|2 years ago
unknown|2 years ago
[deleted]