top | item 46460153 (no title) echion | 1 month ago > you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation partAre you doing this with vLLM, or some other model-running library/setup? discuss order hn newest coder543|1 month ago They're probably referencing this article: https://blog.exolabs.net/nvidia-dgx-spark/
coder543|1 month ago They're probably referencing this article: https://blog.exolabs.net/nvidia-dgx-spark/
coder543|1 month ago