Kernel – Fix cluster_write() inefficiency

[+] re|9 years ago|reply

Looks like HAMMER is DragonFlyBSD's filesystem, for those (like me) who may be unfamiliar with the distro: http://www.dragonflybsd.org/hammer/

Previous discussion about the filesystem vs. ZFS: https://news.ycombinator.com/item?id=8829194

[+] _lbaq|9 years ago|reply

Not really fair to call it a distro, you get the wrong impression, its was forked of FreeBSD 2002 or 2003

[+] ksec|9 years ago|reply

That was against HAMMER 1, which is completely different to HAMMER 2.

[+] jemfinch|9 years ago|reply

Reading the diff, I'm neither surprised that this bug existed nor will I be surprised when similar bugs arise.

There must be a more readable to write even low-level code like this.

[+] koverstreet|9 years ago|reply

Yes, there is. That code needs iterators.

[+] yoklov|9 years ago|reply

honestly, i think it's hard to say this generally without more context into the code. imo nothing there looks too egregious (or at least not abnormally so) for very low level code.

[+] remy_luisant|9 years ago|reply

Couldn't they dump out the list of all individual write commands sent to the disk and have seen that the writes are in a suboptimal pattern? This would have been one of the first things I would have done when testing a file system of any kind: Make sure the thing writes what I want to be written and in the way I want it to be written.

[+] cat199|9 years ago|reply

People write bugs occasionally.

You're talking about the person who spent months tracking down what everyone else thought was a code or compiler bug but actually proved was a subtle CPU bug requiring vendor errata releases, and further, who wrote this log-structured, snapshottable, syncable low-memory using, real-world filesystem from scratch for the highly parallel operating system he has been rearchitecting for over 10 years now. There are consistent status update posts outlining the on disk filesystem structure, as well as its relation to the virtual memory subsystem and the in-kernel parallelism feeding IO writes, as well as constant code commits implementing these things, so I'd reserve the need to back-seat drive here unless I could back it up with a comparable body of work.

http://www.theregister.co.uk/2012/03/07/amd_opteron_bug_drag...

[+] koverstreet|9 years ago|reply

You don't even need to do anything that hard. All you need to is run basic benchmarks and compare the results to what other filesystems can do, and when yours is about half as fast that should make you go "huh"...

[+] forgottenacc57|9 years ago|reply

Many problems have obvious solutions in hindsight.

[+] josh64|9 years ago|reply

Isn't NCQ meant to do this? I mean reorder the write commands into an optimal order.

[+] microcolonel|9 years ago|reply

Always wondered where the bandwidth was going when I was using HAMMER. Had to use UFS when testing last time.

[+] HugoDaniel|9 years ago|reply

How often does a sequential write happen in an SSD ?

[+] pmontra|9 years ago|reply

As often as it happens with HDD, I think.

Anyway, with my SSD:

I unzipped a 300 MB Raspbian lite zip into a 1.3 GB image yesterday and extracted parts from a 8 GB video with mmpeg last weekend. I expect all of them to be sequential given enough free space on the SDD (same for HDDs.)

I also work with docker and vagrant and building those images at half speed can't be good.

[+] digikata|9 years ago|reply

Almost always if one wants to optimize...

[+] Upvoter33|9 years ago|reply

wow, this is an embarrassing bug. To say it doubles bandwidth makes it sound like an optimization, when in fact it is just (finally) doing what you think it should be doing. Yikes- how long was this doing this and no one noticed?

[+] kzrdude|9 years ago|reply

Data loss and corruption is always a lot more embarrassing. Did they have tests that ensure correctness (in data) of the slow version? Then they were already delivering on quality of the file system.

[+] josteink|9 years ago|reply

So if I'm reading this patch correctly it's now writing things in blocks instead of as single bytes? Is that it?

And, well, yes. Ofcourse that does wonder's for throughput. And I would certainly hope that someone who writes a file-system knows that.

Edit: Re-reading the patch it seems like what's being done here is taking a (failed) at attempt at writing in blocks and actually making it write in proper block-sizes. Yay C, I guess.

[+] TheDong|9 years ago|reply

Wow, somehow you managed to read the patch without reading the commit message that explains what actually changed.

No, you didn't read it correctly.

53 comments