top | item 42868250

(no title)

SlavikCA | 1 year ago

2x CPU system may be slower for LLM than 1x CPU system.

Because in 2x CPU system, the model may have to be passed via NUMA, which has 10% - 30% of memory bandwidth bandwidth

discuss

order