(no title)
binarybanana | 4 years ago
It has worked surprisingly well, I can compile Firefox on all cores while playing a game and it's completely imperceptible that anything else is happening, both in terms of performance and interactiveness/latency. No issues with this "burstiness" or stuck/deadlocked processes, either. "Nice" is basically useless in comparison.
geofft|4 years ago
The problem is that we're running services etc., not batch jobs, so we do want them to make meaningful forward progress. So we can't set the shares to zero. We just don't want a misconfigured / runaway job to starve out the rest of the machine, even when the other jobs are not trying to use 100% CPU.
A specific sub-case here is capacity planning - right now, even with CPU shares, you can request one CPU for a computationally-intensive multi-thread/process task, and if the rest of the box is running internal web services with sporadic traffic, it will easily be able to use the whole machine. But then if you launch eight instances of that same job on the machine, they'll all perform much worse. So ideally we want to proactively limit CPU usage so that application developers/operators get realistic expectations about performance, and in turn, we get realistic information about how heavily our cluster is actually used.