(no title)
somerandomdude2 | 4 months ago
"A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s."
Which, if you scale it, matches the GPs statement.
somerandomdude2 | 4 months ago
"A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s."
Which, if you scale it, matches the GPs statement.
yorwba|4 months ago