top | item 35505827

(no title)

S4 and its class of state-space models are an impressive mathematical and signal-processing innovation, and I thought it was awesome how they destroyed previous baselines for long-range tasks.

Have there been any state-space models adapted for arbitrary text generation?

Language models like ChatGPT are trained to predict new words based on the previous ones and are excellent for generation, a harder task than translation or classification. I'm doubtful about the adaptability of text models that deal with fixed-sized input/outputs and don't have an architecture that is as natural for generating indefinitely long sequences.

discuss

sdenton4|2 years ago

Go read about S4, from these authors. It's about having a learnable state-space model which can be efficiently implemented as either an RNN or (very long) convolution, according to the needs of train or inference.

Buttons840|2 years ago

Do these scale as well as transformers? My understanding is that classic RNNs don't scale well, and that is one reason why transformers became popular.

As a pleb who doesn't even own a data center, I've been hoping that a superior machine learning architecture will be discovered that doesn't scale well. We would be fortunate if our personal computers end up being half as good as Microsoft's or Amazon's best models; fortunate if the best architecture gains little from an additional 10,000 GPUs. This would help spread the benefits of AI evenly among anyone with a phone or computer -- a utopia compared to the other possibility, that everyone can learn how to build AI, but only those with a few hundred million to throw at a data center can actually control the means of production -- err, I mean, the means of intelligence.

Philosophically, this wouldn't be unlike people. Humans are still the greatest intelligence we're aware of, and humans don't scale. I'm hoping computer intelligence ends up not scaling well either.