top | item 46460153

(no title)

echion | 1 month ago

> you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part

Are you doing this with vLLM, or some other model-running library/setup?

discuss

order