top | item 16145294

In defence of swap: common misconceptions

150 points| c4urself | 8 years ago |chrisdown.name | reply

148 comments

order
[+] cosarara97|8 years ago|reply
In my experience, a misbehaving linux system that's out of RAM and has swap to spare will be unusably slow. The process of switching to a tty, logging in, and killing whatever the offending process is can easily take a good 15 minutes. Xorg will just freeze. Oh, and hopefully you know what process it is, else good luck running `top`.

Until this is fixed, I'll just keep running my systems with very small amounts of swap (say, 512MB in a system with 16GB of RAM). I'd rather the OOM killer kick in than have to REISUB or hold down the power button.

Some benchmarks with regards to the performance claims would be nice.

[+] cdown|8 years ago|reply
> In my experience, a misbehaving linux system that's out of RAM and has swap to spare will be unusably slow.

Yeah, this is basically the main drawback of swap. I tried to address this somewhat in the article and the conclusion:

> Swap can make a system slower to OOM kill, since it provides another, slower source of memory to thrash on in out of memory situations – the OOM killer is only used by the kernel as a last resort, after things have already become monumentally screwed. The solutions here depend on your system:

> - You can opportunistically change the system workload depending on cgroup-local or global memory pressure. This prevents getting into these situations in the first place, but solid memory pressure metrics are lacking throughout the history of Unix. Hopefully this should be better soon with the addition of refault detection.

> - You can bias reclaiming (and thus swapping) away from certain processes per-cgroup using memory.low, allowing you to protect critical daemons without disabling swap entirely.

Have a go setting a reasonable memory.low on applications that require low latency/high responsiveness and seeing what the results are -- in this case, that's probably Xorg, your WM, and dbus.

[+] vanni|8 years ago|reply
You can use Alt+SysRq+f to manually call oom_kill.
[+] crististm|8 years ago|reply
In 2005 I was able to run Linux on 512MB RAM _without_ swap (on purpose - every day) without issues. Today it will bark at me on 8GB of RAM for not having swap enabled.
[+] kzrdude|8 years ago|reply
What function does the 0.5GB swap have?
[+] phire|8 years ago|reply
I actually did this on a old laptop, setup 200mb of swap for 4gb of ram.

And it caused huge problems for me, would run out of swap while having plenty of free memory and then go cripplingly slow.

[+] _ph_|8 years ago|reply
The point being is, that a system doesn't have to misbehave to allocate more memory than the total RAM. And in those cases, there is a very good reason to have swap space, and swapping won't impact the performance of the system - rather the opposite.
[+] quotemstr|8 years ago|reply
Very few people on this thread read and understood the article. The point isn't working with data sets larger than RAM. The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch.

Banning swap is like making self-storage companies illegal and forcing everyone to hold all possessions in their homes. Sure, you'd be able to get to grandma's half broken kitschy dog coaster that you can't bring yourself to throw away, but you'd also be harder to harder to fit and find your own stuff, the stuff you need all the time.

If you find yourself driving to and from the self storage place every day, you probably need a bigger home. But self storage is plenty useful even if you almost never visit it.

[+] _9jgl|8 years ago|reply
The issue is that the current OOM killer doesn't support this usage at all.

To extend the analogy: what do you do if grandma comes and fills your house with stuff? You need space to work, so you go and drop it off at the self storage place, but what if she just keeps filling your house up?

The OOM killer will do absolutely nothing until both your house and the whole self storage place are totally full. By that point, you've spent a huge amount of time just driving to and from self storage, so you haven't had time to do any actual work; it would probably have been better to tell grandma that you don't want any more stuff once she filled up your house for the first time.

[+] tmyklebu|8 years ago|reply
> Very few people on this thread read and understood the article.

Hmm. I read the article and I think I understood it. However, in my experience, you run out of RAM if and only if your working set is too big. In my experience, all involved find it desirable to reduce the size of the working set as quickly as possible. Your experience seems to differ.

> The point isn't working with data sets larger than RAM. The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch.

Your reasoning is too sloppy. It supports neither your blanket statements nor your pained analogy.

You appear to presuppose that:

(1) The kernel can predict which pages the user will "almost never touch."

(2) Mispredicting which pages will be "almost never touched" is of relatively low cost.

(3) Swapping pages that the user will "almost never touch" to disk frees up an appreciable amount of RAM.

(4) When pulling those pages back from disk, the work held up is, on average, less important than whatever we got to do with the RAM in the meantime.

I disagree with (1). Like I said elsewhere in the comments on this article, the kernel cannot reliably predict whether a process will "almost never touch" a given page. The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.

I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad. When lots of mispredictions happen in a tight cluster, the kernel and all running processes will be stopped when the user forcibly bounces the machine. If you let the OOM killer run instead of swapping, the kernel stays up and only a few running processes die. Having a working set whose size is larger than RAM but smaller than RAM + swap seems to be a recipe for a very long cluster of such mispredictions and a human intervention.

I am curious to hear about workloads where (3) occurs. (Non-latency-sensitive Java code that doesn't churn objects too fast? You've allocated a heap of a certain size, and the half or so that's free doesn't get disturbed too much.)

Regarding (4), even if the kernel could reliably predict cold pages, "page will almost never be touched" isn't necessarily the right criterion for swapping a page to disk. What if reading from the page will be on the critical path for something users do care about, such as logging in and killing a misbehaving process?

[+] WalterBright|8 years ago|reply
> self storage is plenty useful

With self-storage rising to over $300 per month, it's more cost effective to take the stuff to the dump and buy it again if it is ever needed.

[+] burnte|8 years ago|reply
"Very few people on this thread read and understood the article."

I started to read the article, and then thought, "I know this, who doesn't know this?" and stopped.

"The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch."

Exactly. Who with any technical experience in this day and age doesn't understand that. Are there really people trying to argue against swap?

[+] mikekchar|8 years ago|reply
Feel free to explain it to me.

" Under no/low memory contention

[...]

Without swap: We cannot swap out rarely-used anonymous memory, as it’s locked in memory. While this may not immediately present as a problem, on some workloads this may represent a non-trivial drop in performance due to stale, anonymous pages taking space away from more important use."

Now imagine that I have no memory contention. In other words I've got 8 Gigs of memory and I have never run out of memory. The OOM killer has never run. I've never even come close. How exactly is this representing a non-trivial drop in performance?

To be fair, if I put some of my long running processes into swap, I could cache more files, but I really don't see how this represents a statistically significant improvement. I honestly can't think of anything else.

If you sometimes run out of memory (or even get close), then you should have some swap. This seems fairly obvious to me. Relying on the OOM killer to "clean things up" is pretty dubious. But was there every any serious argument to do this? I've literally never heard of that before.

I'd be very happy to hear something enlightening about this, but I didn't see anything in the article (perhaps I missed it).

[+] wahB4vai|8 years ago|reply
I'm a big fan of determinism and service uniformity. Having that rarely used and response time critical function/data/whatever swapped out increases service time variation at best and complicates all worst case response time calculations at least.

I understand from the land of JIT compilers, garbage collectors, and oversubscribed everything that this is not much of a substantial concern as these features are already traded away.

The swap may be the best case in a bad situation. I would argue along the lines of don't be in a bad situation...

I'm looking at you 8 of 16 GB used on cold boot Mac laptops... Looking at you with indignation and rancor Chrome.

[+] caf|8 years ago|reply
One of the article's points is that running without swap doesn't necessarily alleviate that. The rarely-used code pages of your rarely-used but response time critical daemon can just as easily be dropped from the page cache and have to be refaulted in from disk, and in fact that's more likely if there isn't swap available to stow the dirty anonymous pages from the cron daemon that wakes up once a day or whatever.

The solution for your rarely-used but response time critical daemon is for it to mlock() its critical data and code pages into memory, which works regardless of whether or not you have swap available. (Or, alternatively, use one of the cgroup controllers that the article alludes to, to give the critical daemon and related processes memory unaffected by memory pressure elsewhere in the system).

[+] perlgeek|8 years ago|reply
> Under temporary spikes in memory usage > With swap: We’re more resilient to temporary spikes, but in cases of severe memory starvation, the period from memory thrashing beginning to the OOM killer may be prolonged. We have more visibility into the instigators of memory pressure and can act on them more reasonably, and can perform a controlled intervention.

Somehow that doesn't resonate with my experience. I tend to remember the cases where I can't even SSH into the box, because the fork in sshd takes minutes, as does spawning the login shell.

I'd really like some way to have swap, but still loosen the OOM killer on the biggest memory hog when the system slows down to a crawl. I haven't found that magic configuration yet.

[+] JdeBP|8 years ago|reply
Well the article does suggest a mechanism to apply.

As for the problem with SSH and login: You might well find that it is not the fork that is the problem. You might well be surprised at how much chaff is run by a login shell, or even by non-login shells.

A case in point: I recently reduced the load on a server system that involved lots of SCP activity by noticing that, thanks to RedHat bug #810161, every SCP session even though it was a non-login non-interactive shell was first running a program to enumerate the PCI bus, to fix a problem with a Cirrus graphics adapter card that the machine did not have on a desktop environment that the machine did not have. This driven by /etc/bashrc sourcing /etc/profile.d/* .

* https://github.com/FedoraKDE/kde-settings/blob/F-26/etc/prof...

[+] XorNot|8 years ago|reply
Swap iotime quotas maybe? I suspect the solution is a lot more complicated though - what I really want is a way to wall off my UI so it stays responsive during swap thrashing and let's me react to the situation.
[+] codesnik|8 years ago|reply
I remember not being able to SSH on the box too, but this stopped be the case (for me at least) some five years ago - I was able to login to heavy swap trashing boxes without any problem. I thought that something was changed in contemporary linux distros. Either login shell was given much more priority or something like that.
[+] avar|8 years ago|reply
Even for those who understand this well, it's historically been really hard to coerce the Linux kernel into applying the right swap policies to your application.

As the author notes much of this has been improve by cgroups, and there's always been big hammers like mlock(), even with those things it can be hard to prevent memory thrashing in extreme cases. I've seen swap disabled completely by people who understood how it worked as a last result because of that.

It's always seemed to me that this was mainly a problem of the kernel configuration being too opaque. Why can't you configure on a system-wide basis that you can use swap e.g. only for anonymous pages and for nothing else?

Similarly it would be nice to have a facility for the OOMkiller to call out to userspace (it would need dedicated reserved memory for this) to ask a configured user program "what should I kill?". You might also want to do that when you have 10G left, not 0 bytes.

[+] caf|8 years ago|reply
Swap is only used for anonymous pages (well, and dirty private file pages, which are basically the same thing).
[+] slaymaker1907|8 years ago|reply
I recently reenabled swap on my Windows machine due to frequent OOM, even with 16GB of RAM while playing Overwatch and browsing on Firefox. It seems like both of these programs allocate vast swaths of memory but then do not actually use that memory very heavily. After I turned swap back on, I did not notice any degradation in performance but my system stability skyrocketed.
[+] wilun|8 years ago|reply
Windows has vastly different policies for RAM allocation and commit than Linux. Windows basically does not overcommit while Linux systems not only does it by default but quite depend on it for various loads to work properly. In consequence, the userspace has a tendency to handle RAM differently, but there is no magic: if programs are trying to allocate twice the amount of RAM and then only use half of it, Windows with a swap that can be large enough will work perfectly while without swap the allocation will fail. Under Linux, the situation is less clear: without swap it will succeed (well it depends on the fine details of overcommit that are selected, but you get the idea) although if you really then use all that RAM, the OOM killer will start to more or less "randomly" kill "any" process to cope with the lack of RAM (as a last resort measure though; the caches and buffers are flushed before, etc.)
[+] black_puppydog|8 years ago|reply
Without wanting to impose my method or reasoning here, I run my dev machine without swap, and I'd rather have the same for the cluster machines I access.

This is for academic use only. I know how much RAM my machine has, and if I oom, it usually isn't because I tried to squeeze in just a tiny bit too much data, but rather because I made some stupid mistake and keep allocating small chunks of memory very rapidly. On a system with even a moderate amount of swap, this makes everything grind to a halt, and it is usually much faster to just reboot the machine and deal with the problems later in the unlikely event that rebooting actually causes problems.

[+] viraptor|8 years ago|reply
If you're running a single-(or close to)-purpose machine, then seeing an explicit memory usage limit on the main app could give you even better/faster results.
[+] dboreham|8 years ago|reply
We've disabled swap (by not configuring a swap partition) on every server we've deployed since 2009. It's a little irritating to have to manually remove the swap partition from various Linux' "server" default install options even today. Of course this means I'm still installing on bare metal that I own, so...dinosaur.
[+] leoc|8 years ago|reply
I'd be more convinced by the argument that swap shouldn't be thought of as slow RAM if the author addressed the fact that it's generally known as 'virtual memory'—and it has been since at least System 370, so it's not simply a later misconception: http://pages.cs.wisc.edu/~stjones/proj/vm_reading/ibmrd2505M... . Instead the article just omits the term 'virtual memory' completely, and pretty conspicuously.

I also think that a convincing case for swap would have to discuss the concepts of latency, interactivity, and (soft) real-time performance, things that largely weren't to the fore in the salad days of the 370 family or the VAX. Virtual memory is the TCP of local storage.

[+] JdeBP|8 years ago|reply
That is not the argument.

The article actually says, four times over, that it should not be thought of as emergency memory. It's not emergency memory; it's ordinary memory that should see use as part of an everyday memory hierarchy.

And if you are going to question the terminology, the elephant in the room that you have missed is calling paging swapping. (-:

[+] nicklaf|8 years ago|reply
I have an older Chromebook (c720), which is really quite memory starved (2GB RAM), and have experienced ChromeOS completely frying the SSD simply through prolonged tab-heavy swapping.

Now, I've replaced the SSD and installed a non-Google Linux distro, and would like to limit the amount of swapping Firefox can do.

I had been planning to simply use cgroups' memory features to limit the amount of memory consumed by Firefox processes, but if I am to understand the article (which I admit I didn't read in full detail), I should also be able to tune swapping to limit the actual amount of swapping that takes place, avoiding a drastic uptick in SSD wear whenever open too many tabs.

That, and perhaps a Firefox extension that suspends background tabs in memory (which I've used before with a certain amount of effectiveness in the pre-WebExtension days).

[+] reynhout|8 years ago|reply
ChromeOS does not use an on-disk swap partition. Your SSD died just because cheap SSDs like those typically found in Chromebooks die early. :(

ChromeOS uses zram instead of physical swap, which works quite well, even on 2GB models. Zram is available in any Linux distro, being built into the kernel, and is also the default configuration in GalliumOS (Ubuntu+Xfce for Chromebooks, most of which are less broadly compatible than your PEPPY).

[+] tombrossman|8 years ago|reply
Firefox has a couple options which may help you get by. I can't guarantee these will fix everything but they are worth experimenting with.

about:memory has various options, including a 'minimize memory usage' button and profiling tools.

about:preferences has Privacy & Security > Cached Web Content > Override automatic cache management (select and set at 500MB, 1GB, or whatever works best).

[+] bruce_one|8 years ago|reply
Setting the `swappiness` value might be of use to you.

Setting a lower value (min is 0, default is 60 iirc, max is 100, so above half) reduces the likelihood of the kernel swapping. The lower the number the fuller ram needs to be before the kernel will start swapping. Hence a low number will mean that less swapping will happen, hence meaning less SSD wear.

[+] CoconutPilot|8 years ago|reply
Swap was a great idea, but its time is gone. Swap doesn't make sense anymore, hard drives have not scaled and kept up with the improvements in RAM.

In the Pentium 1 era EDO RAM maxed out at 256MB/s and hard disk xfer was 10MB/s. Common RAM size was 16MB.

In today's era DDR4 maxes out at 32GB/s and hard disk xfer is 500 MB/s. Common RAM size is 16GB.

RAM xfer rate has grown is 320x. RAM capacity has grown 100x. Disk xfer rate has grown 50x.

Swap is no longer a useful tool.

[+] whopa|8 years ago|reply
This is an informative and well written article, but seems incomplete in this day and age. In public cloud environments, network attached storage is far more prevalent, so the swap story may be different there (I honestly don't know though). Since the author works at Facebook, he probably lacks experience in this regard.
[+] Anderkent|8 years ago|reply
Every cloud provider I've worked with (okay, so AWS :P) gives you ephemeral local storage. Obvioulsy you don't swap onto a network drive.
[+] merb|8 years ago|reply
well the default kubernetes install (kubeadm) will actually fail installing when having swap enabled. (even worse you can force him to ignore that, but kubelet would fail starting when swap is enabled).
[+] ohazi|8 years ago|reply
Does hibernating via a swap file work reasonably well yet? I haven't had a chance to try this out yet, but that's the main reason I still have a swap partition on my laptop.
[+] gerdesj|8 years ago|reply
Well done mate - you are the first person to mention this here. It was also only briefly mentioned in the article.

Yes, hibernation does work well and it requires swap. Personally, I set a swap partition equal to RAM + 512MB on systems that I want to hibernate on.

Linux also supports swap files and this might be handy: https://wiki.debian.org/Hibernation/Hibernate_Without_Swap_P...

[+] bhouston|8 years ago|reply
On windows I have found it necessary to disable swap to keep myself efficient. Many times Ive had applications decide to allocate massive amounts of memory and then it leads to my system slowing down with tons of swap activity. In nearly all cases, I didn't want my system to try its best to handle these massive memory requests but rather it should have just killed the offending application. Often in these failure scenarios the swap goes nuts and my computer becomes unresponsive that it takes a long time to even get to kill the bad actor.

Thus I disabled swap and I never had these unresponsive issues. I run with 32GB of ram so generally well behaved applications never run into memory issues.

Some applications that would cause issues would be too many VirtualBox instances that use more than available memory. A text editor that chocks trying to open a >1GB text file (looking at you, the new JS-based editors.)

[+] maxxxxx|8 years ago|reply
Windows is terrible at swapping. As soon as you hit max RAM performance suffers a lot. Even if you never use inactive apps.
[+] javitury|8 years ago|reply
For many years now I use ram compression instead of swap for desktops/laptops. I particularly like zram but zswap is also great if you you are hitting hardware limits.

The difference with swap is that the computer doesn't get unresponsive, it just slows down a bit. And Ram compression still buys some time before OOM killer hits.

[+] ibiza|8 years ago|reply
Count me among the believers in running w/ swap. Here's all it takes to provide Linux with a little swap space:

  fallocate -l 8G /swapfile
  chmod 0600 /swapfile
  mkswap /swapfile
  swapon /swapfile
Add an entry in /etc/fstab & you're done. "This little trick" made all the difference on a compute cluster I managed, where each node contained 96G of RAM. It's much more pleasant to monitor swap usage than the OOMKiller and related kernel parameters.
[+] kartickv|8 years ago|reply
How long before desktop OSs manage memory like mobile ones, automatically shutting down background apps that aren't being used, so that the system remains responsive no matter what?

Or, worst case, if two tasks that need 8GB each are running on a machine with 8GB memory, kill one, let the other finish, and restart the first one. Or, less ambitious, freeze one, swap it out to disk, let the other finish, and only then resume the frozen app.

Desktop OSs are so primitive at memory management, forcing the complexity onto the user.

[+] vardump|8 years ago|reply
> How long before desktop OSs manage memory like mobile ones, automatically shutting down background apps

Once desktop systems and applications support required APIs to handle saving state before being shut down.

[+] viraptor|8 years ago|reply
I feel like the real metric is like to see from swap usage is: how much time did I spend waiting to swap-in a page, how much extra cache pages it gave me, and what's the cache hit ratio. If the big purpose is to allow those extra few pages to be available, then either it's worth doing or not - there should be an objective way to look at this. Unfortunately only the second and third part is easily available. The first... maybe via systemtap?
[+] cdown|8 years ago|reply
> how much time did I spend waiting to swap-in a page

You can do this with eBPF/BCC by using funclatency (https://github.com/iovisor/bcc/blob/master/tools/funclatency...) to trace swap related kernel calls. It depends on exactly what you want, but take a look at mm/swap.c and you'll probably find a function which results in the semantics you want.

[+] kstenerud|8 years ago|reply
So, in other words, if you have enough memory for your workload that you won't run out, there's no benefit to having swap space (i.e. you've wasted money on memory you don't need).

But if you DO have swap space, there won't be a performance hit (at least not under Linux) because it will only swap out some rarely used pages and then sit there doing nothing.

So, in the general case, it's better to have it and not need it than need it and not have it.

[+] Aardwolf|8 years ago|reply
In defence against swap on my personal computer:

-PCs have a lot of RAM now

-When you allocate that much memory it's usually a bug in your own code like a size_t that overflowed. I never saw programs I would actually want to use try to allocate that much

-When using swap instead of ram, everything becomes so slow that you're screwed anyway. The UI doesn't even respond fast enough to kill whatever tries to use all that memory.

-How common is a situation where you need more memory than your ram size yet less than ram+swap size in a useful way? Usually if something needs a lot, it's really lot (and as mentioned above not desirable)

-Added complexity of making extra partition

-Added complexity if you want to use full disk encryption

-I do the opposite of using disk as ram: I put /tmp in a ramdisk of a few gigs

-Disks are slow and fast ssd's are expensive so you would't want to sacrifice their space (maybe if this changes some day...)