top | item 46323397

(no title)

ben_s | 2 months ago

Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitly offload things like the KV cache to CPU memory, but that's an application-level tradeoff, not transparent GPU virtual memory.

discuss

No comments yet.