top | item 39390886

(no title)

kiraaa | 2 years ago

maybe they are using ring attention, on top of their 128k model.

discuss

order

ein0p|2 years ago

More likely some clever take on RAG. There’s no way that 1M context is all available at all times. More likely parts of it are retrievable on demand. Hence the retrieval-like use cases you see in the demos. The goal is to find a thing, not to find patterns at a distance

kiraaa|2 years ago

could be true, we can only speculate.