Do we really need swap on modern systems?

[+] mwpmaybe|9 years ago|reply

My personal rules of thumb for Linux systems. YMMV.

* If you need a low-latency server or workstation and all of your processes are killable (i.e. they can be easily/automatically restarted without data loss): disable swap.

* If you need a low-latency server or workstation and some of your processes are not killable (e.g. databases): enable swap and set vm.swappiness to 0.

* SSD-backed desktops and other servers and workstations: enable swap and set vm.swappiness to 1 (for NAND flash longevity).

* Disk-backed desktops and other servers and workstations: accept the system/distro defaults, typically swap enabled with vm.swappiness set to 60. You can and likely should lower vm.swappiness to 10 or so if you have a ton of RAM relative to your workload.

* If your server or workstation has a mix of killable and non-killable processes, use oom_score_adj to protect the non-killable processes.

* Monitor systems for swap (page-out) activity.

[+] Animats|9 years ago|reply

Swapping should have disappeared years ago. At best, it gives the effect of twice as much memory, in exchange for much slower speed. It was invented when memory cost a million dollars a megabyte. Costs have declined since then. How much does doubling the memory cost today?

What seems to keep swap alive is that asking for more memory ("malloc") is a request that can't be refused. Very few application programs handle an out of memory condition well. Many modern languages don't handle it at all. Nor is it customary to check for a "memory tight" condition and have programs restrain themselves, perhaps by starting fewer tasks in parallel, opening fewer connections, keeping fewer browser tabs in memory, or something similar.

I've used QNX, the real time OS, as a desktop system. It doesn't swap. This make for very consistent performance. Real-time programs are usually written to be aware of their memory limits.

Most mobile devices don't swap. So, in that sense, swapping is on the way out.

[+] AnthonyMouse|9 years ago|reply

> Nor is it customary to check for a "memory tight" condition and have programs restrain themselves, perhaps by starting fewer tasks in parallel, opening fewer connections, keeping fewer browser tabs in memory, or something similar.

These aren't mutually exclusive and are actually complementary with swap.

If you have more than enough memory then swap is unused and therefore harmless. The question is, what do you do when you run out? Making the system run slower is almost always better than killing processes at random.

And it gives processes more time to react to a low memory notification before low turns into none and the killing begins, because it's fine for "low memory" to mean low physical memory rather than low virtual memory.

It also does the same thing for the user. "Hmm, my system is running slow, maybe I should close some of these 917 browser tabs" is clearly better than having the OS kill the browser and then kill it again if you try to restore the previous session.

[+] thatcks|9 years ago|reply

Swap space is only partially related to virtual memory overcommit, and virtual memory overcommit is extremely common and almost unavoidable on most Unix machines. Part of this is a product of a deliberate trade-off in libraries between virtual address space and speed (for example, internally rounding up memory allocation sizes to powers of two), and part of this is due to Unix features that mean a process's theoretical peak RAM usage is often much higher than it will ever be in reality.

(For example, if a process forks, a great deal of memory is shared between the parent and child. In theory one process could dirty all of their writeable pages, forcing the kernel to allocate a second copy of each page. In practice, almost no process that forks will do that and reserving RAM (or swap) for that eventuality would require you to run significantly oversized systems.)

[+] euyyn|9 years ago|reply

Plus mobile apps do get, and usually handle, a low-memory notification from the OS.

[+] Gaelan|9 years ago|reply

Until Apple stops soldering on memory, swap will still be alive on the desktop.

[+] dredmorbius|9 years ago|reply

Memory allocation is a non-market operation on (most? all?) operating systems. There's effectively no cost to processes allocating memory, and a fair cost to them not doing so.

I'm not sure that turning this into a market-analagous operation (bidding some ... other scarce resource -- say, killability?) might make the situation better or worse. And the problem ultimately resides with developers. But as a thought experiment this might be an interesting place to go.

[+] scottlamb|9 years ago|reply

I hate swap. My experience with it is that once a disk-backed machine (as opposed to SSD) has started swapping, it's essentially unusable until you manually force all anonymous pages to be paged in by turning off swap ("sudo swapoff -a" on Linux) or reboot.

My hunch is that the OS is swapping stuff back in stupidly. Once memory is available, I'd like it to page everything back proactively, preferring stuff from swap and then from file-backed mmaps. But instead it seems to be purely reactive, each major page fault requiring a disk seek to page in what's needed with little if any readahead. Basically the whole VM space remains a minefield until you stumble over and detonate each mine in your normal operation. Much better to reboot and have a usable system again.

On my Linux systems, I've turned off swap.

On OS X...last I checked, I wasn't able to find a way to do this. I'd like to turn off swap entirely, or failing that, have some equivalent way to force all of swap to be paged in now so I don't have to reboot when I hit swap. Anyone know of a way?

[+] benibela|9 years ago|reply

Something seems to be seriously wrong with the swap implementation on modern systems.

20 years ago on Windows 98 it just started swapping, but it was no big deal. If something became too slow to be usable, you could just press ctrl+alt+del and kill that swapped program and everything worked fine afterwards.

On the other hand, my modern linux laptop, it starts swapping, and it swaps and swaps and you can do nothing, not even move the mouse, till 30 minutes later something crashes.

[+] bsdetector|9 years ago|reply

> on Windows 98 it just started swapping, but it was no big deal.

At that time, swapping out a 4k page was a significant part of memory: 4k of 16 MiB is 1/4096 of memory. Each swap gets back a lot of memory the program needs. Now the swap is still 4k pages, but memory has expanded by a thousand fold. Basically swap is a thousand times worse today than it was in the time of Windows 98.

For harddrives swap isn't used now to expand memory, it's used to remove initialization code and other 'dead' memory. Swap should be set to only a tiny fraction of the memory size for this reason, to prevent it from being used to handle actual out-of-memory conditions. But realistically for most users it's not even worth enabling at all because of the occasional memory that needs to be swapped in from disk.

For SSDs the seek speed has improved to match the extra memory so swap can still be used like in the old days to expand the effective memory size. But memory is so large a swap file that's a fraction of memory size to offload 'dead' memory is enough unless there's a specific reason to actually use swap for out-of-memory.

[+] throwawayish|9 years ago|reply

I have been using various operating systems for a while.

I feel like Linux has, in general, from a UX point of view, the worst behaviour when swapping and the worst behaviour in general under memory pressure.

I feel like it has gotten worse over time, which might not be just the kernel but the general desktop ecosystem. If you require much more memory to move the mouse or show the task manager equivalent, then the system will be much less responsive when it thrashes itself.

Honestly, I'ld much rather have Linux just crash and reboot, that'd be faster than it's thrashing-tantrums.

Luckily, there's earlyoom, which just rampages the town quickly if memory pressure approaches. Like a reboot (ie. damage was done), just faster.

In any case, it makes me sad (in a bad way) to see how bad the state of things is when it comes to the basics of computing, like managing memory.

[+] Asooka|9 years ago|reply

Because Windows 98 always kept enough resources available to show you the c-a-d dialog. On Linux, however, there is no "the shell must remain interactive at all times" requirement, so a daemon that gobbles memory and your rescue shell have the exact same priority. Modern Windows even has a graphics card watchdog and if any application issues a command to the GPU that takes too long, it's suspended and the user is asked if it should be killed. Probably not what you want on an HPC that does deep learning, but exactly what you want on an interactive desktop.

I suppose it might be possible to whip something up with cgroups and policy that will keep the VT, bash, X and a few select programs always resident in memory and give them ultimate I/O priority, but I haven't tried.

[+] plorkyeran|9 years ago|reply

This is the exact opposite of my experience. Back in the Windows 9x days it was a fairly routine experience for the system to soft-lock with the HD grinding away and I'd sometimes end up just hard rebooting the computer after waiting a few minutes for the ctrl-alt-delete dialog to appear. On macOS with a SSD I don't even notice when my system is swapping heavily.

[+] thiagobbt|9 years ago|reply

Isn't this related to this change on kernel 4.10? https://kernelnewbies.org/Linux_4.10#head-f6ecae920c0660b7f4...

[+] rootbear|9 years ago|reply

Could this be a reflection of the increasing gulf between RAM speed and HD speed? Even with NVMe drives, which one probably shouldn't be swapping to anyway, RAM is orders of magnitude faster.

[+] tedunangst|9 years ago|reply

0. Possibly not true in all cases. 1. Modern systems are much more aggressive about enormous disk caches, which can ironically lead to io storms when it swaps out your application to buffer writes, then has to flush the cache to swap the app back in. 2. Difference in working set size and number of background programs waking up.

[+] derefr|9 years ago|reply

What I've always been specifically confused about, is if there's any point in giving a VM a swap partition inside its virtual disk, rather than just giving it a lot of regular virtual memory (even overcommitting compared to the host's amount of memory) and then letting the host swap out some of that RAM to its swap partition.

Personally, I've never given VMs swap. I'd rather have memory pressure trigger horizontal scaling (or perhaps vertical rescaling, for things like DBMS nodes) than let Individual VMs struggle along under overloaded+degraded conditions.

[+] sirn|9 years ago|reply

One usage of swap in modern systems: hibernation. If you need to use hibernation, that means a swap must exists, either as a swapfile (pre-allocated, as uswsusp require a fixed offset on the disk to resume) or as a partition.

[+] lmm|9 years ago|reply

I've been reading these stories for ten years. About 8 years ago I started taking them seriously and stopped using swap. Turns out not having swap works much better. I'm amazed how slowly the consensus seems to be moving though.

[+] njharman|9 years ago|reply

Systems are used for vastly different purposes. With different memory usages and expected operation.

There can be no consensus because there is no one answer.

[+] jcrites|9 years ago|reply

We reached this same conclusion for our servers generally. The problem with swap is that it's unpredictable. It's better most of the time to have a system that's predictable. However much RAM is available to the system, you can deal with that, by making an appropriate choice of hardware type, or by scaling up, tuning software, etc. It's harder to deal with performance problems related to use of swap in my experience, since it's nondeterministic what will be swapped.

[+] problems|9 years ago|reply

Yeah. I've had issues with this on some systems.

On Windows without swap when you hit a remotely low on RAM point, things start going really poorly for some reason - random latency. So with 16 GB of RAM even I can't disable swap on Windows without some really strange performance characteristics, I run SSDs so I really wanted it off and I just stuffed more RAM in my box - with 32 GB it isn't a problem.

On Linux however, you can pretty much turn it off and everything will run smooth until you're actually out and then you lag badly briefly, Linux's oom-killer does its thing and all is good again within the span of a few seconds.

[+] rogerbinns|9 years ago|reply

Two examples of why I have swap:

* On a laptop to hibernate, which results in zero power consumption vs suspend which will drain the battery in a day or so

* I use tmpfs for /tmp and using swap as the backing is far more performant than regular filesystems

[+] tossaway1|9 years ago|reply

> I've been reading these stories for ten years. About 8 years ago I started taking them seriously and stopped using swap.

Not sure what you're referring to here. This story doesn't recommend eliminating swap...

[+] contingencies|9 years ago|reply

Ditto, and over that period memory has become even cheaper.

I sort of wonder if we'll see a 100% RAM, large memory laptop soon that boots from an SD-card or in a cryptographically secure fashion over 4G wireless networks, aggressively disables RAM for power saving and suspends well.

[+] pizzetta|9 years ago|reply

Aren't there legacy applications which expect swap where otherwise with modern applications swap isn't necessary? Or, at least that is my current (mis)-understanding...

[+] phil21|9 years ago|reply

This is by far my biggest pet peeve in the space. The "rule of thumb" that you need 2x RAM as swap. Even 10 years ago this "rule" was ancient and useless but it was always a constant challenge educating customers as to why, and that yes - we really did know better than your uncle Rob.

Once a server hits swap, it's dead. There is no recovering it other than for exceptional cases. If you are swapping out, you've already lost the battle.

I tend to configure servers with 512MB to 1GB swap simply so the kernel can swap out a couple hundred MB of pages it never uses - but that's really more to make people feel better than it really being useful at all.

[+] gravypod|9 years ago|reply

I wish we took the path of EROS [0] rather then "RAM and DISK are seperate". A lot of problems stem from that incompatable viewpoint of computing. Computer Science is about hiding complexity under lays of abstraction that continualy provide safer states and constraints on the things built on top of them. Our abstraction that RAM and DISK are seperate is not safer nor does it provide constraints that are simple to navigate. Thinking about this the other way, where DISK is all you need and memory is just a write-through cache, is much safer in my opinion and leads to some really cool application design.

If RAM and DISK are the same, then writing a file system is just writing an in-memory tree. No need to pull data from the disk, just navigate the tree in your program's memory and pull the blob data out. Want to persist acorss reboots, protect against power outages, or save user settings? Just set a variable and it'll be there.

The benifits are much better then the costs.

[0] - https://web.archive.org/web/20031029002231/http://www.eros-o...

[+] rogerbinns|9 years ago|reply

The AS/400 (or whatever they call it now) had an approach like that. Everything was on disk and RAM was just a cache of disk. That also meant every "object" had an address and could be accessed by any process with suitable permissions. There are lots of other things they do, with a very different approach than Unix, Windows etc.

Frank Soltis' book is recommended reading: https://www.amazon.com/dp/1882419669/

[+] vidarh|9 years ago|reply

The challenge with this is that abstracting away disk in a way that isn't horribly leaky is incredibly hard as long as one lets us manipulate individual bits and the other requires us to write whole sectors.

Note that EROS is not providing a write-through cache. It's providing a write-back cache using checkpointing coupled with a journalling capability and ability to explicitly sync data.

So it's leaky: Your application needs to know that it needs to structure it's writes to memory so that they will make sense if the system comes back up with some of the data missing, and needs to know how to use the journalling functionality.

It can't just act as if it's running in RAM forever.

[+] nobodyorother|9 years ago|reply

You might want to investigate Mumps:

https://en.wikipedia.org/wiki/MUMPS

Setting data in memory is the same as setting data on disk, the only difference is the name of the variable:

s X=1 ; store 1 in variable named X, in memory.

s ^X=X ; store 1 in variable named X, on disk.

s X=^X ; load disk to memory

[+] jandrese|9 years ago|reply

How is this different than just memory mapped files? I guess it happens a little more automatically, but it doesn't seem to really solve a major problem that I can see.

[+] mayoff|9 years ago|reply

Is iOS a modern system? Because iOS does not have swap.

> Although OS X supports a backing store, iOS does not. In iPhone applications, read-only data that is already on the disk (such as code pages) is simply removed from memory and reloaded from disk as needed. Writable data is never removed from memory by the operating system. Instead, if the amount of free memory drops below a certain threshold, the system asks the running applications to free up memory voluntarily to make room for new data. Applications that fail to free up enough memory are terminated.

https://developer.apple.com/library/content/documentation/Pe...

[+] sevensor|9 years ago|reply

My desktop at work has 16G of RAM. I didn't bother setting up swap, and I find the old guidance (2x RAM) pretty absurd at this point. I've had the OOM-killer render the system unresponsive a couple of times, but only because I'd written a program that was leaking memory and I was pushing it to misbehave. If you really want virtual memory on purpose, you can still set up a memory-mapped file for your big data structure.

[+] jerf|9 years ago|reply

Putting spinning-rust-backed swap on a 16G system is absurd. By the time such a system is into swap, it probably isn't trying to swap three or four megabytes, it's probably trying to swap three or four gigabytes, and that can literally take hours. Simply writing that much data to a hard drive can take a non-trivial amount of time, and swap doesn't generally just cleanly run out to the hard drive with nothing else interfering, it's a lot messier. Given the speeds of everything else involved, a 16GB RAM system trying to swap to a hard drive, even a good one to say nothing of those slow-writing SMR hard drives [1], is basically a system that has completely failed and it might as well just start OOM-killing things.

A system backed by an SSD does degrade more nicely, though. The system visibly slows down but doesn't go to outright unresponsive like it does on a hard drive. You can make a case for letting that happen and having human intervention select the processes to kill, rather than letting the kernel do it. So, even though it still isn't really useful as an extension of RAM, it can still be useful in recovering from systems that you've run yourself out of memory on. Since putting an SSD in my systems I've actually gone back to running with some swap space. Though the fact I like hibernation sometimes is also a reason I run with swap in Linux on my laptop.

[1]: Swap will almost certainly completely blow out the buffers on those things and you'll be stuck with the raw hardware write speeds pretty quickly.

[+] amyjess|9 years ago|reply

> I've had the OOM-killer render the system unresponsive a couple of times

Use earlyoom instead of relying on oom-killer.

https://github.com/rfjakob/earlyoom

To quote from the description:

> The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think what it will do. I have yet to be patient enough to wait for it.

[...]

> This made people wonder if the oom-killer could be configured to step in earlier: superuser.com , unix.stackexchange.com.

> As it turns out, no, it can't. At least using the in-kernel oom killer.

And earlyoom exists to provide a better alternative to oom-killer in userspace that's much more aggressive about maintaining responsivity.

[+] sddfd|9 years ago|reply

I don't have swap either. On 8GB it is pretty annoying, because a program I often use frequently overcommits and the system hangs.

Is there any way to tell the OOM killer which program to kill first?

[+] jlgaddis|9 years ago|reply

My new workstation has 128 GB of RAM. It also has 1 GB of swap (on NVMe) that, AFAICT, has never been touched. I use it as sort of a canary that something abnormal is happening if it starts being used.

[+] amyjess|9 years ago|reply

I haven't used swap in years, and more recently I've accompanied that by using earlyoom [0] to start killing processes when RAM usage rises above 90%.

Both changes have made my computers much more usable. Systems should designed to fail fast when memory is low instead of slowing down.

[0] https://github.com/rfjakob/earlyoom

[+] ChuckMcM|9 years ago|reply

One of the things we used at Blekko was that swap became a 'soft' indicator that something on the system had exceeded its foot print (our machines all had 96GB of RAM so it meant something had too much RAM) and OOM-killer messages in the log was grounds for taking the machine out and rebooting it and looking for a more serious problem (like sometimes things rebooted and had 32GB less RAM).

That said, the article's recommendation was spot on in terms of making a conscious decision on how you want your system to behave when its coming close to running out of memory. Large swap space was originally the way you got those things that were too big to fit in memory to run, and now they are a way to essentially batch process very large data sets.

[+] mixedbit|9 years ago|reply

If Linux has no swap, it doesn't quickly and efficiently kill processes when memory is exhausted. Instead it first removes executable code from RAM and reads it back from disk when needed. This is because without swap executable code is the only thing in RAM that is duplicated on disk and can be removed. This makes the system completely frozen and unusable.

[+] rcxdude|9 years ago|reply

This is my experience too. I used to run my desktop without swap, but found that the experience when running out of memory was even worse than with swap. Also there appears to be enough memory which isn't actually used frequently that it gives a bit more memory headroom (I will still manage to use up 32GB of RAM).

[+] phire|9 years ago|reply

Last time I tried running a linux system with zero swap, I ran into huge issues.

It would never actually hit the OoM killer, instead it would just lock up while it still technically had a few hundred mb of memory free.

From what I can tell, it was stuck in a loop evicting something from cache and then immediately pulling it back in from disk. Everything was technically still running, but the ui wasn't responsive enough for me to even kill a program.

Simply adding 200mb of swap would change the behaviour enough that the OoM killer would eventually run.

[+] phkahler|9 years ago|reply

I never understood the rule of thumb where swap space was proportional to the amount of physical RAM. It seem to me it should be the size of your largest expected allocation (system wide) minus the amount of physical RAM or something like that. If you had a nicely configured system and took out half the RAM it doesn't make sense that you'd want less swap space.

[+] jedberg|9 years ago|reply

My feeling on swap is this:

1) If you're ok with one machine dropping out of your system, you don't need swap.

2) You should never build a system where losing a single machine is a problem.

3) Therefore, you should never need swap

4) Perhaps there is an exception for a desktop machine, since it's doesn't fit rule 2.

[+] galdosdi|9 years ago|reply

Tend to agree.

A bit of a side ramble: Unfortunately, sometimes regarding rule 2, you already have a system where losing a single machine is a problem, and it will take time and resources to improve or replace it to the point where losing a single machine isn't a problem, so "in the meantime" you have to accept and support this.

Also, sometimes "the meantime" is very long. :-(

Also, by the time the system is improved to be more resilient, maybe you'll be working somewhere else or on something else, and, presto, you'll uncover some other horrible legacy system in your dependency chain that isn't resilient either. It seems as if at every organization that has had computers for long enough, there is an infinite supply of legacy systems.

Point being unless you only work with brand new things that themselves only work with brand new things, you can't get out of getting decent at managing services that aren't properly "any single machine can disappear" resilient

[+] perlgeek|9 years ago|reply

Doesn't that risk cascading failures?

A cluster of a few machines experiences a bunch of requests that trigger pathological memory usage. One machine OOMs, drops out. Now the rest of the cluster has to take more load, needs more memory, and increases the likelihood that the other machines also run out of memory.

[+] zumu|9 years ago|reply

How do you hibernate with no swap? Do you need a special hibernation partition to write to?

[+] rkeene2|9 years ago|reply

The main issue I have with not using swap in modern Linux is that it will cause the kernel to be busy for hours at a time. What happens is, as the kernel runs low on RAM, it has to spend more time searching for smaller and smaller chunks of RAM to back the request, the smaller chunks are more numerous and the "kswapd" kernel thread is responsible for this activity. As the system approaches 0 RAM free kswapd will also try to release less important pages, which takes more CPU time. Ultimately you get to the point where allocations take a really long time, and there are lots of allocations.

[+] rini17|9 years ago|reply

I recommend using swap together with zswap, and increase swappiness. Zswap is available in mainline kernel. It keeps compressed "swapped-out" pages in memory (so they are accessible quickly on page fault) and only uncompressible pages go to disk. Usually most of memory is compressible and overhead is small, so it is suitable for many workloads. See https://wiki.archlinux.org/index.php/Zswap .

249 comments