top | item 13627875

Mkfile(8) is severely syscall limited on OS X

102 points| mpweiher | 9 years ago |blog.metaobject.com

88 comments

order
[+] drewg123|9 years ago|reply
" I did not check on other operating systems, but my guess is that the results would be similar."

Actually, no. Irrespective of mkfile being mostly MacOSX specific, most sycalls on MacOSX are just plain slow as compared to Linux (or FreeBSD). I think this is partially an artifact of performance not being a major metric for most system calls on MacOSX. The systems calls where performance is a critical metric (like timekeeping) use a kernel / user shared page interface to avoid the syscall entirely in the vast majority of the cases.

I recall doing benchmarking roughly 10 years ago to find the best interface to communicate with a mostly in an OS-bypass HPC driver. MacOSX ioctls were some multipler more expensive than Linux, but the native IOKit Mach IPC was even more expensive than that. Sigh. There was a similar story for sockets in non-OS bypass mode, where simply writing on a socket was far more expensive than Linux.

Somebody needs to resurrect lmbench & do a comparison of the various x86 kernels available these days. Maybe that would shame Apple into focusing on performance.

Drew

[+] 3wisemen|9 years ago|reply
Using mkfile compiled for Linux on a tmpfs gets me 1550 MiB/s.
[+] cbsmith|9 years ago|reply
I'm pretty sure that even at their slowest, syscalls aren't going to match IO overhead. It's not like the mkfile is CPU bound.
[+] dom0|9 years ago|reply
And that's why sane people will tell you to not use something like st_blksize as the IO size, because that's far to small. Many tools use 32 KiB, but that can be limiting even on Linux and needlessly uses more CPU (especially if FUSE is involved -- reads are not coalesced at the VFS layer! Read and write merging is done at the block device layer by the IO scheduler there!). Something like 512k-1M is a sane default these days.
[+] gens|9 years ago|reply
Maybe i'm misunderstanding you, but.. Before every one a "echo 3 > /proc/sys/vm/drop_caches" is executed. Processor is in "performance" mode.

dd if=./big_file.file of=/dev/null bs=64K

535020540 bytes (535 MB, 510 MiB) copied, 8.70128 s, 61.5 MB/s

dd if=./big_file.file of=/dev/null bs=4K

535020540 bytes (535 MB, 510 MiB) copied, 8.36283 s, 64.0 MB/s

dd if=./big_file.file of=/dev/null bs=1K

535020540 bytes (535 MB, 510 MiB) copied, 8.82508 s, 60.6 MB/s

Only thing gone up is the cpu usage (same is negligible at ~60MB/s).

dd if=/dev/zero of=/dev/null bs=1K count=1024K

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.06113 s, 1.0 GB/s

dd if=/dev/zero of=/dev/null bs=4K count=256K

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.343595 s, 3.1 GB/s

dd if=/dev/zero of=/dev/null bs=32K count=32K

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.160191 s, 6.7 GB/s

[+] cbsmith|9 years ago|reply
The whole thing is kind of odd. Normally I'd want to leave filling the file to filesystem...
[+] im3w1l|9 years ago|reply
Oh wow, I had no idea that write buffers needed to be on the order of a megabyte on modern systems.
[+] mpweiher|9 years ago|reply
Neither did I! And in fact it took almost 2 days to question/overcome my assumptions.

"It's not what you don't know that kills you. It is what you know for sure that ain't true" -- Mark Twain

[+] mcguire|9 years ago|reply
Does anyone know offhand the architecture of the Mach/BSD hybrid kernel in this case? It sounds suspiciously like a problem with the IBM Microkernel back in about 1992, and with the OSF's mkLinux, which is apparently a predecessor of Mac OS.

Specifically, do the syscalls writing the file go through multiple protection domains?

https://maniagnosis.crsr.net/2011/07/this-is-comment-i-made-...

[+] Someone|9 years ago|reply
"the OSF's mkLinux, which is apparently a predecessor of Mac OS."

Apple worked on MkLinux, but it isn't technically a predecessor to Mac OS X. The two do not share a single line of code; if they did, Apple would have to license Mac OS under the GPL.

XNU, the Mac OS kernel (https://en.wikipedia.org/wiki/XNU) isn't a real microkernel; the functionality of a microkernel is there, but quite a bit of code was added that, in a true microkernel, would live in userspace.

[+] Matt3o12_|9 years ago|reply
This still begs the question I've had since day one when learning about buffers: what the heck is the recommended buffer size. I've seen a lot of old code that use extremely small buffer size as well as some recent code that use extremely high buffer sizes (up to 10MB).

What is a good sweet spot that runs well on older hardware (less then 10 years old) and new hardware? And should network buffers be bigger or smaller then disk buffers?

[+] throwawayish|9 years ago|reply
Network buffers are a completely different animal, also harder to optimize...

Disk IO buffers, it's easy nowadays, just use something like a MB, which is just fine for almost any application, and doesn't stack up to much memory use (unless you're writing many files concurrently, which can bring it's own problems as well)

[+] RyanZAG|9 years ago|reply
Judging by the post here, 512kb buffers seems like a good bet?
[+] gens|9 years ago|reply
Just do a test with dd or some small piece of code. Why rely on heresay.
[+] amelius|9 years ago|reply
So is this any different on Linux? Why is OSX in the title?

Are syscalls more expensive on OSX?

[+] akandiah|9 years ago|reply
> So is this any different on Linux? Why is OSX in the title?

mkfile is not available on Linux. The equivalent utility is xfs_mkfile or fallocate.

[+] mschuster91|9 years ago|reply
I guess it is the same. Try doing a dd if=/dev/sdX of=/dev/null or dd if=/dev/zero of=/dev/sdX (be warned, the latter wipes the disk!).

Then, experiment with various buffer sizes (bs=1k, 10k, 100k, 1M) - I personally use 1M with dd.

Be warned on OS X you have to use /dev/rdiskX instead of /dev/diskX, as the latter is a buffered version that usually is slower.

[+] Svip|9 years ago|reply
I think the main difference is that `mkfile' doesn't exist on Linux? As far as I can gather, it's a Sun OS utility, that BSD has and therefore OS X.

I've read that some Linux distributions have `mkfile', but it's just a script wrapper around `dd'.

[+] mpweiher|9 years ago|reply
Good point, the reason is simple: I didn't check on Linux (and have updated the post to reflect this). I am guessing it would be similar, but don't have any data.
[+] gwu78|9 years ago|reply
While HN is discussing XNU sycalls, anyone know why the most fundamental of syscalls, execve, according to NeXT/Apple's post-UNIX wisdom needs to have an extra "char *apple[]" argument vector?

Not to imply that it is "hidden" but I am curious if any HN users know about this and understand its purpose?

[+] msbarnett|9 years ago|reply
When in doubt, consult the source: https://github.com/opensource-apple/xnu/blob/53c5e2e62fc4182...

The upshot is that it looks like it constructs some non-env variable environment data on the program's stack after the posix environment.

The absolute path of the executable, followed by preemption free zone addresses, entropy, a configuration setting for malloc allocation strategy, and the address of the main thread's stack afaict.

[+] rollthehard6|9 years ago|reply
Some years ago I picked up the habit from a predecessor of testing such things with dd instead, that way you can experiment with the effect of different block sizes, so like -

dd of=/dev/zero of=/ddtest.out myfile bs=64k count=65536

[+] pedrocr|9 years ago|reply
The graph could use an X axis label. I'm assuming it's "Buffer size in kB". It would also be nice to include the datapoint you started with (250MB/s from a 512byte buffer).
[+] masklinn|9 years ago|reply
That's actually written below the graph, rather than as a graph label

> X-Axis is buffer size in KB. The original 512 byte size isn't on there because it would be 0.5KB or the entire axis would need to be bytes, which would also be awkward at the larger sizes. Also note that the X-Axis is logarithmic.

[+] raverbashing|9 years ago|reply
Well, there's this option (which looks like it would be the way to go for its purpose):

-n Create an empty filename. The size is noted, but disk blocks aren't allocated until data is written to them

Also the description on the man file says: mkfile creates one or more files that are suitable for use as NFS-mounted swap areas. The sticky bit is set, and the file is padded with zeroes by default

[+] grigjd3|9 years ago|reply
That's missing the point. The author wanted to test the write speed and found that mkfile was not a good tool to test with.
[+] snorrah|9 years ago|reply
Is this something that would be affected by file system defaults? I see it writes in 512 byte chunks by default, when MacOS moves to APFS, is this something that might be altered as a result and thus get a 'free' speedup?
[+] mkj|9 years ago|reply
It's a command, not a syscall!
[+] quinnftw|9 years ago|reply
Yes but mkfile uses syscalls to create and write to the file. With a smaller buffer more write syscalls are required to populate the file.
[+] Sharlin|9 years ago|reply
The article doesn't make such a claim as far as I can see.