(no title)
aljungberg | 3 years ago
Is there a primer for what RWKV does differently? According to the Github page it seems the key is multiple channels of state with different decaying rates, giving I assume, a combination of short and long term memory. But isn’t that what LSTMs were supposed to do too?
thegeomaster|3 years ago
[1]: https://arxiv.org/abs/1901.02860
gok|3 years ago
solomatov|3 years ago
Hendrikto|3 years ago
euclaise|3 years ago
eternalban|3 years ago
https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-a...
swyx|3 years ago
any sources to read more about this please? its the first ive heard of it
nl|3 years ago
georgehill|3 years ago
https://karpathy.github.io/2015/05/21/rnn-effectiveness
solomatov|3 years ago