top | item 36707011

(no title)

iflp | 2 years ago

This seems to be about easier classification tasks with not too many samples, for which TF-IDF also works well (Table 3). But more generally gzip for text modeling might make sense. Quoting http://bactra.org/notebooks/nn-attention-and-transformers.ht... :

> Once we have a source-coding scheme, we can "invert" it to get conditional probabilities; we could even sample from it to get a generator. (We'd need a little footwork to deal with some technicalities, but not a heck of a lot.) So something I'd really love to see done, by someone with the resources, is the following experiment:

> - Code up an implementation of Lempel-Ziv without the limitations built in to (e.g.) gzip; give it as much internal memory to build its dictionary as a large language model gets to store its parameter matrix. Call this "LLZ", for "large Lempel-Ziv".

> - Feed LLZ the same corpus of texts used to fit your favorite large language model. Let it build its dictionary from that. (This needs one pass through the corpus...)

> - Build the generator from the trained LLZ.

> - Swap in this generator for the neural network in a chatbot or similar. Call this horrible thing GLLZ.

> In terms of perplexity, GLLZ will be comparable to the neural network, because Lempel-Ziv does, in fact, do universal source coding.

Maybe someone on HN will have resources for such an experiment?

discuss

ailef|2 years ago

This is really interesting. How would this compare in terms of performance/resources necessary for training/inference w.r.t. neural networks?

Would this be leaner and run on less or would it reach the same complexity eventually?