(no title)
lettergram | 5 months ago
https://medium.com/capital-one-tech/why-you-dont-necessarily...
At the time it was clear to all on the team that RNNs, just like transformers later on, are general purpose frameworks that really only require more data and size to function. In the 2018-2020 era and probably today, they are slower to train. They also are less prone to certain pitfalls, but overall had the same characteristics.
In the 2019-2020 I was convinced that transformers would give way to better architecture. The RNNs in particular trained faster and required less data, particularly when combined with several architectural components I won’t get into. I believe that’s still true today, though I haven’t worked on it in the last 2-3 years.
That said, transformers “won” because they are better overall building blocks and don’t require the nuances of RNNs. Combined with the compute optimizations that are now present I don’t see that changing in the near term. Folks are even working to convert transformers to RNNs:
https://medium.com/@techsachin/supra-technique-for-linearizi...
There are also RNN based models beating Qwen 3 8B in certain benchmarks
I suspect over time the other methods my team explored and other types of networks and nodes will continue to expand beyond transformers for state of the art LLMs
algo_trader|5 months ago
Counter consensus is where the alpha is...
Do you think rnn/rwkv have an edge with verifiable domains and tree search inference time? You can use cheaper gpus and do multiple sampling.
(but of course, its hard to beat the sunk cost of a foundation model)