top | item 23415069

(no title)

Recurrent neural nets is the general term for nets with memory as you describe. Indeed LSTMs, a type of recurrent net, used to be state of the art on language tasks until the GPT transformer models. I'm sure somebody somewhere is working to make a transformer with recurrency. The neural turing machine mentioned in another comment is such an example but it seems to have been abandoned.

The main problem with recurrent models is its hard to train them with backprop. For example the GPT-3 can handle sequences up to ~2000 tokens? I'm not sure what the largest sequence LSTMs could be trained on but it was probably less.

discuss

gwern|5 years ago

LSTMs typically forget after more than a few hundred tokens (vanishing gradients?), so while you could probably BPTT 2000+ steps these days, there wouldn't be much point.

> I'm sure somebody somewhere is working to make a transformer with recurrency. The neural turing machine mentioned in another comment is such an example but it seems to have been abandoned.

Yeah, there's a bunch of Transformer variants which either use recurrency, compression for long-range, or efficient attention approximation for windows so large as to obviate recurrency. The NTM hasn't been shown useless so much as alternatives like Transformers proven to be way easier to implement & scale up to get similar performance, but it pops up occasionally; a particularly surprising recent appearance was Nvidia's GameGAN which uses a NTM-like memory module for learning to model Pac-Man: https://nv-tlabs.github.io/gameGAN/

lostmsu|5 years ago

I've recently read a paper, that enables very long unrolls in RNNs due to O(1) memory requirements (in number of unroll steps): https://arxiv.org/pdf/2005.11362.pdf