top | item 38089807

(no title)

heavyarms | 2 years ago

I've been thinking along the same lines. The token window IMO should be a conceptual inverted pyramid, where there most recent tokens are retained verbatim but previous iterations are compressed/pooled more and more as the context grows. I'm sure there's some effort/research in this direction. It seems pretty obvious.

discuss

order

matsemann|2 years ago

But some of the earlier tokens are also the most important ones, right? Like the instructions and rules you want it to follow.

visarga|2 years ago

Phrase embeddings could bring a 32x reduction in sequence length because:

> Text Embeddings Reveal (Almost) As Much As Text. ... We find that although a naïve model conditioned on the embedding performs poorly, a multi step method that iteratively corrects and re embeds text is able to recover 92% of 32-token text inputs exactly. We train our model to decode text embeddings from two state of the art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

https://arxiv.org/abs/2310.06816

a_wild_dandan|2 years ago

They are. Moreover, the idea that AI companies are missing and/or not implementing this “obvious” tactic is hilarious. Folks, these approaches have profound consequences for training and inference performance. Y’all aren’t pointing out some low hanging fruit here, lol