top | item 37840116

(no title)

keonix | 2 years ago

I assume it's because such large context takes lots of memory, so you might as well have smarter model if you are not gonna fit in small vram anyway

discuss

brucethemoose2|2 years ago

Personally, I have found that Mistral 7B (with its native 8K context, and decent results stretched out even more) is performing much better than llama 13B tunes for storytelling, where that long context is really important.

And I think the optimized backends should implement that sliding 16k context soon...

Anyway, point is a huge context really helps certain types of queries, and VRAM usage is reasonable with a 7B model.