The creation of the io.latency block I/O controller

[+] abhinai|7 years ago|reply

So many algorithms, data structures and patterns in computer sciences were based on the idea that hard disks are slow and their seek times are especially slow. However, the introduction of solid state drives has changed this. There days I have almost no server with an actual disk drive on it.

I wonder how (1) the software has evolved to face this new reality, (2) how much of the old code is still being used and (3) what are the performance penalties we pay for using disk performance characteristic assumptions in code that actually runs on solid state drives?

[+] thatsaguy|7 years ago|reply

Incidentally, SSDs also benefit from read and write locality. Although there's no seek penalty, there's still a large benefit from dispatching multiple reads and writes from/to the same cell. You get those for free by trying to minimize "seeks", although the underlying logic behind this optimization is simpler.

[+] ignoramous|7 years ago|reply

Well, most distributions choose saner defaults. For instance, RHEL/CentOS/AmazonLinux use noop io-scheduler by default for SSD based machines.

Though, I get your point: for instance, the TCP keep-alive defaults aren't suitable for all kinds of workloads: https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-...

I guess, that's why performance tuning is an art form as it is workload dependant: https://youtu.be/89fYOo1V2pA

[+] zepearl|7 years ago|reply

Sorry, I don't understand what this article is taking about. Is it about the classical CFQ/Deadline I/O schedulers and/or the new multiqueue schedulers? Or something completely different? Thx :)

[+] ignoramous|7 years ago|reply

(per my understanding) This isn't about a new io-scheduler. You'd use this io.latency io-controller that sits a layer above the io-scheduler and reduces io queue-size as a means to throttle write requests. The key insight in how to do this per control-group is to use a hierarchical non-blocking data-structure to represent related workloads, which could be classified into 'fast' and 'slow', with 'fast' penalising the 'slow' group (by the way of reducing the io queue-size available to it) iff 'fast' sees higher read latencies. 'Unrelated' group doesn't get penalised as it isn't part of the same hierarchy, as it were.

There are some gotchas though, for instance, priority inversion occurs where 'fast' needs more memory but 'slow' needs to be paged-out first, resulting in 'fast' being peanlized indirectly for the write-throttling on 'slow'.

The approach here is reminiscent of HFSC qdisc for TCP/IP (which only a handful of people understand?): https://www.cs.cmu.edu/~hzhang/HFSC/main.html

[+] viraptor|7 years ago|reply

The way I understand it is it's a new thing, stacked with the selectable io schedulers. But it can reach deeper than them - specifically when you're writing dirty pages back to the disk in the background the existing schedulers were not aware of how to manage that. This one does.

(please correct me if I got something wrong)

[+] shereadsthenews|7 years ago|reply

It kinda boggles the mind that an org as successful and as well-staffed as FB still does something as primitive as chef on a cron.

[+] laggyluke|7 years ago|reply

What would you do instead?

12 comments