top | item 43274869

(no title)

dulakian | 1 year ago

I am using the Q6_K_L quant and it's running at about 40G of vram with the KV cache.

Device 1 [NVIDIA GeForce RTX 4090] MEM[||||||||||||||||||20.170Gi/23.988Gi]

Device 2 [NVIDIA GeForce RTX 4090] MEM[||||||||||||||||||19.945Gi/23.988Gi]

discuss

lostmsu|1 year ago

What's the context length?

dulakian|1 year ago

The model has a context of 131,072, but I only have 48G of VRAM so I run it with a context of 32768.