top | item 34223938

(no title)

Interestingly it sounds like offloading could be made quite efficient in a batch setting if you primarily care about throughput rather than latency. Though I guess for most current LLM applications latency is quite important.

discuss

No comments yet.