cdown | 4 years ago | on: Tmpfs inode corruption: introducing inode64
cdown's comments
cdown | 5 years ago | on: A Facebook crawler was making 7M requests per day to my stupid website
cdown | 5 years ago | on: A Facebook crawler was making 7M requests per day to my stupid website
I just sent you an e-mail, you can also reply to that instead if you prefer not to share those details here. :-)
cdown | 6 years ago | on: 1195725856 and other mysterious numbers
It's back now. Thanks!
cdown | 8 years ago | on: In defence of swap: common misconceptions
cdown | 8 years ago | on: In defence of swap: common misconceptions
cdown | 8 years ago | on: In defence of swap: common misconceptions
You can do this with eBPF/BCC by using funclatency (https://github.com/iovisor/bcc/blob/master/tools/funclatency...) to trace swap related kernel calls. It depends on exactly what you want, but take a look at mm/swap.c and you'll probably find a function which results in the semantics you want.
cdown | 8 years ago | on: In defence of swap: common misconceptions
SIGTERM and friends? :-)
If your application is just dropping state on the floor as a result of having an intentionally trappable signal being sent to it or its children, that seems like a bug.
cdown | 8 years ago | on: In defence of swap: common misconceptions
What about applications that have memory-bound performance characteristics? In these cases, saving a bit of memory often directly translates into throughput, which translates into $$$.
This isn't a theoretical, a bunch of services which I've run in the past and currently literally make more money because of swap. By using memory more efficiently and monitoring memory pressure metrics instead of just "freeness" (which is not really measurable anyway), we allow more efficient use of the machine overall.
cdown | 8 years ago | on: In defence of swap: common misconceptions
As for the analogy -- there are metrics you can use today to bat away grandma before she starts hoarding too much. We have metrics for how much grandma is putting in the house (memory.stat), at what rate we kick our own stuff out of the house just to appease grandma, but then we realise we removed stuff we actually need (memory.stat -> workingset_refault), and similar. Using this and Johannes' recent work on memdelay (see https://patchwork.kernel.org/patch/10027103/ for some recent discussion), it's possible to see memory pressure before it actually impacts the system and drives things into swap.
cdown | 8 years ago | on: In defence of swap: common misconceptions
Yeah, this is basically the main drawback of swap. I tried to address this somewhat in the article and the conclusion:
> Swap can make a system slower to OOM kill, since it provides another, slower source of memory to thrash on in out of memory situations – the OOM killer is only used by the kernel as a last resort, after things have already become monumentally screwed. The solutions here depend on your system:
> - You can opportunistically change the system workload depending on cgroup-local or global memory pressure. This prevents getting into these situations in the first place, but solid memory pressure metrics are lacking throughout the history of Unix. Hopefully this should be better soon with the addition of refault detection.
> - You can bias reclaiming (and thus swapping) away from certain processes per-cgroup using memory.low, allowing you to protect critical daemons without disabling swap entirely.
Have a go setting a reasonable memory.low on applications that require low latency/high responsiveness and seeing what the results are -- in this case, that's probably Xorg, your WM, and dbus.
cdown | 9 years ago | on: Facebook Messenger begins testing end-to-end encryption using Signal Protocol
cdown | 11 years ago | on: The Best Way to Organize a Lifetime of Photos
Sure, some of these applications store a copy both online and on disk, but if your application chooses to run amok and delete everything, that's not really going to matter in many cases. I've seen horrifying things happen when iPhoto tries to sync with the cloud before. :-(
One of the earliest test versions of my patch actually did inode reuse using slabs just like you're suggesting, but there are a few practical issues:
1. Performance implications. We use tmpfs internally within the kernel in a lock-free manner as part of some latency-sensitive operations, and using slabs complicates that somewhat. The fact that we make use of tmpfs internally as the kernel makes this situation quite different than other filesystems.
2. Back when I was writing the patch, each memory cgroup had its own set of slabs, which greatly complicated being able to reuse inodes as slabs between different services (since they run in different memcgs).
After it became clear that slab recycling wouldn't work, I wrote a test patch that uses IDA instead, but I found that the performance implications were also not tenable. There are other alternative solutions but they increase code complexity/maintenance non-trivially and aren't really worth it.
A 64-bit per-superblock inode space resolves this issue without introducing any of these problems -- before you go through 2^64-1 inodes, you're going to have other practical constraints anyway, at least for the timebeing :-)