Ask HN: How do you backup your linux system?

[+] gerdesj|8 years ago|reply

It .... depends.

Given I worry about this sort of thing for a living and am a partner in the firm: I think in terms of backup, DR, BC, availability and more. I have access to rather a lot of gear but the same approach will work for anyone willing to sit down and have a think and perhaps spend a few quid or at least think laterally.

For starters you need to consider what could happen to your systems and your data. Scribble a few scenarios down and think about "what would happen if ...". Then decide what is an acceptable outage or loss for each scenario. For example:

* You delete a file - can you recover it - how long

* You delete a file four months ago - can ...

* You drop your laptop - can you use another device to function

* Your partner deletes their entire accounts (my wife did this tonight - 5 sec outage)

* House burns down whilst on holiday

You get the idea - there is rather more to backups than simply "backups". Now look at appropriate technologies and strategies. eg for wifey, I used the recycle bin (KDE in this case) and bit my tongue when told I must have done it. I have put all her files into our family Nextcloud instance that I run at home. NC/Owncloud also have a salvage bin thing and the server VM I have is also backed up and off sited (to my office) with 35 days online restore points and a GFS scheme - all with Veeam. I have access to rather a lot more stuff as well and that is only part of my data availability plan but the point remains: I've considered the whole thing.

So to answer your question, I use a lot of different technologies and strategies. I use replication via NextCloud to make my data highly available. I use waste/recycle bins for quick "restores". I use Veeam for back in time restores of centrally held managed file stores. I off site via VPN links to another location.

If your question was simply to find out what people use then that's me done. However if you would like some ideas that are rather more realistic for a generic home user that will cover all bases for a reasonable outlay in time, effort and a few quid (but not much) then I am all ears.

[+] bluGill|8 years ago|reply

Put all personal data on a zfs z2 RAID system (FreeNAS). Take regular snapshots.

Someday I'm going to get a second offsite system to do ZFS backups to, but so far the above has served well. Then again I've been lucky enough to never have a hard drive fail, so the fact that I can lose 2 without losing data is pretty good. I'm vulnerable to fire and theft, but the most likely data loss scenarios are covered.

[+] RJIb8RBYxzAMX9u|8 years ago|reply

Even in 2017 you still can't sneakernet sometimes. I have a ZFS file server similar to yours, and it acts as the backup destination for all my other devices. Then, I use two USB HDDs to back up the server, then bring one to work. If either is attached a nightly script bring it up to date. I just keep rotation the two between home & work.

[+] Terribledactyl|8 years ago|reply

I do something very similar, my oldest build gets synched once a year and is in cold storage. I hope I never need it. The hard drives from my previous build hold cold copies of the most valuable volumes (monthly sync), and the current system has plenty of redundancy and snapshots (insurance, not really backups). I use zerotier to stay in sync when away from home.

[+] arantius|8 years ago|reply

This.

I've got incremental snapshots sent to a box in a family member's house (and vice versa).

[+] notenoughdhh|8 years ago|reply

[deleted]

[+] pmoriarty|8 years ago|reply

I use rdiff-backup.[1]

"rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership (if it is running as root), modification times, acls, eas, resource forks, etc. Finally, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted."

[1] - https://github.com/sol1/rdiff-backup

[+] atmosx|8 years ago|reply

Me too and I'm pretty happy with it. My retention period is 1Y for most systems.

[+] sanpi|8 years ago|reply

With borg (https://borgbackup.readthedocs.io/) and a custom script (https://raw.githubusercontent.com/sanpii/deploy/master/src/b...) to test pg backup and sync to another server.

[+] drbawb|8 years ago|reply

I use `btrbk` as a systemd service to snapshot my `/home` subvolume hourly & any other important subvolumes daily. `btrbk` manages the retention policy, which is roughly something like:

- daily snapshots for 1 week

- the first snapshot of every week for 4 weeks

- the first snapshot of every month for 2 years

- the first snapshot of every year for 5 years

Since I use entirely SSD storage I also have a script that mails me a usage report on those snapshots, and I manually prune ones that accidentally captured something huge. (Like a large coredump, download, etc. I do incremental sends, so I can never remove the most recent snapshot.)

Since snapshots are not backups I use `btrfs send/receive` to replicate the daily snapshots to a different btrfs filesystem on spinning rust, w/ the same retention policy. I do an `rsync` of the latest monthlies (once a month) to a rotating set of drives to cover the "datacenter burned down" scenario.

My restore process is very manual but it is essentially: `btrfs send` the desired subvolume(s) to a clean filesystem, re-snapshot them as read/write to enable writes again, and then install a bootloader, update /etc/fstab to use the new subvolid, etc.

---

Some advantages to this setup:

* incremental sends are super fast

* the data is protected against bitrot

* both the live array & backup array can tolerate one disk failure respectively

Some disadvantages:

* no parity "RAID" (yet)

* defrag on btrfs unshares extents and thus in conjunction with snapshots this balloons the storage required.

* as with any CoW/snapshotting filesystem: figuring out disk usage becomes a non-trivial problem

[+] blakesterz|8 years ago|reply

This comes up here quite a bit, lots of great answers in the past 2 discussions:

https://news.ycombinator.com/item?id=12999934

https://news.ycombinator.com/item?id=13694079

[+] portref|8 years ago|reply

I've long used rsnapshot for automated incremental backups, and manually run a script to do a full rsync backup with excludes for /tmp, /sys and the like to an external drive.

http://rsnapshot.org/

[+] stakent|8 years ago|reply

I use rsnapshot from cron to external USB drive encrypted with luks. I swap the drive with similar one weekly and store it offsite.

[+] blfr|8 years ago|reply

I don't back up the system. Drive failures are so rare nowadays that I will reinstall more often because of hardware changes.

The important stuff (projects, dotfiles) I keep on Tarsnap. I also rsync my entire home directory to an external drive every other week or so.

Similar for servers but I do back up /etc as well.

[+] gerdesj|8 years ago|reply

Drive failures are really rare until they happen when your data is personal experience of a few devices. When you worry about a data centre or n and or rather a lot of other systems with storage then you realise that it happens with monotonous regularity.

Now keeping a few short copies is also fine provided you don't make mistakes. Have you ever wanted to recover from a cock up you did six months ago or three years ago?

You do offsite (Tarsnap), so you have covered off local failures - cool.

Everyone's needs are different and the value they place on their data is different but I would respectfully suggest that you think really hard about how important some bits of your data are and protect them appropriately.

Fuck ups are hard to recover from 8)

[+] gorkonsine|8 years ago|reply

>I also rsync my entire home directory to an external drive every other week or so.

This is what I do as well, just not quite as often. Sometimes I wonder if I should switch to something like rdiff-backup to get snapshots, but that would only really be useful if I accidentally deleted a file and didn't notice for a while, for instance, which in practice is not a serious problem.

If I had more time and inclination, I might set up a small 2-drive RAIDed NAS box, and do automated regular backups to that. But for my laptop PC, just doing regular syncs to an external HD seems to be fine for now.

[+] icebraining|8 years ago|reply

Why would you reinstall due to hardware changes? My Debian installation has survived the move from two laptops already without any glitches.

[+] BeetleB|8 years ago|reply

>Drive failures are so rare nowadays that I will reinstall more often because of hardware changes.

2 HD failures in the last 4 years. Not rare for me :-(

[+] derekp7|8 years ago|reply

http://www.snebu.com -- something I wrote because I wanted snapshot style backups, but without the link farm that rsync-based snapshot backups produce. Snebu does file-level deduplication, compression, multi-host support, files are stored in regular lzo-compressed files, and metadata is in a catalog stored in a sqlite database.

Only real things missing is encryption support (working on that), and backing up KVM virtual machines from the host (working on that too).

[+] brensmith|8 years ago|reply

For both windows/linux/macOS boxen we use burp. Replaced a propietary program used for over 15 years. http://burp.grke.org/ I highly recommend it.

[+] cmurf|8 years ago|reply

Mostly local network storage which is backedup multiple times automatically, for the laptop I do manual btrfs send/receives manly to get things restored exactly the way they were.

#helps to see the fstab first

    UUID=<rootfsuuid>   /          btrfs   subvol=root 0 0
    UUID=<espuuid>      /boot/efi  vfat    umask=0077,shortname=winnt,x-systemd.automount,noauto 0 0

    cd /boot
    tar -acf boot-efi.tar efi/
    mount <rootfsdev> /mnt
    cd /mnt
    btrfs sub snap -r root root.20170707
    btrfs sub snap -r home home.20170707
    btrfs send -p root.20170706 root.20170707 | btrfs receive /run/media/c/backup/
    btrfs send -p home.20170706 home.20170707 | btrfs receive /run/media/c/backup/
    cd
    umount /mnt

So basically make ro snapshots of current root and home, and since they're separate subvolumes they can be done on separate schedules. And then send the incremental changes to the backup volume. While only incremental is sent, the receive side has each prior backup to the new subvolume points to all of those extents and is just updated with this backups changes. Meaning I do not have to restore the increments, I just restore the most recent subvolume on the backup. I only have to keep one dated read-only snapshot on each volume, there is no "initial" backup because each subvolume is complete.

Anyway, restores are easy and fast. I can also optionally just send/receive home and do a clean install of the OS.

Related, I've been meaning to look into this project in more detail which leverages btrfs snapshots and send/receive. https://github.com/digint/btrbk

[+] heywire|8 years ago|reply

Main home "server" is an Ubuntu system with a couple 2TB HDDs. It runs various services for IoT type stuff, has a few samba shares, houses my private git repositories, backups from Raspberry Pi security cameras, etc. It is backed up to the cloud using a headless Crashplan install. I use git to store dotfiles, /etc/ config files, scripts, and such, in addition to normal programming projects.

We back up photos from our iOS devices to this server using an app called PhotoSync. I also have an instance of the Google Photos Desktop Uploader running in a docker container using x11vnc / wine to mirror the photos to Google Photos (c'mon Google, why isn't there an official Linux client???). I'm really paranoid about losing family photos. I even update an offsite backup every few weeks using a portable HDD I keep at the office.

[+] gerdesj|8 years ago|reply

"I'm really paranoid about losing family photos"

No you aren't paranoid. Sensible. However, you've only just started. Try and do a restore every now and then from the HDDs.

There is no such thing as paranoia when it comes to protecting your data.

[+] alyandon|8 years ago|reply

CrashPlan for /home

Plain old btrfs snapshot + rsync to local usb drive and offsite host for /etc, /var, /root

[+] towb|8 years ago|reply

Just keeping my home folder safe, everything else is up and running in less than one hour if I had to start over. Daily backups with borg backup plus a couple of git repos for the dotfiles. All the important stuff is small enough for this, my backup is like 25 gb (a lot of random crap included), and all the photos and videos we used to worry about a few years ago is up in some unlimited sized google cloud for free. Times are pretty good :)

[+] binaryapparatus|8 years ago|reply

Excellent rsync-time-backup for local machine, backing up etc and home to external disk: https://github.com/laurent22/rsync-time-backup

Duply for servers, keeping backups on S3: http://duply.net/

Cron does daily DB dumps so Duply stores everything needed to restore servers.

[+] paol|8 years ago|reply

- Dirvish[0] for local backups (nightly)

- Crashplan[1] for cloud backups (also nightly; crashplan can backup continuously but I don't do that)

Pretty happy with it, though dirvish takes a little bit of manual setup. Never had to resort to the cloud backups yet.

[0] http://www.dirvish.org/

[1] https://www.crashplan.com/

[+] beeforpork|8 years ago|reply

Backup to external hard drive with btrfs. Rsync is used to copy the full source file system, with short exception lists for stuff I don't want backupped. After the sync, a btrfs snapshot is taken to get history. These napshots are removed with an exponential strategy (many snapshots are recent, few are old, the oldest is always kept), keeping ~30 snapshots a year.

Backup takes ~10min for searching 1TB of disk space. The daily diff is typically 6..15 GB, mostly due to braindead mail storage format...

I want to keep it simple but still have full history and diff backup: no dedicated backup tool, but rsync + btrfs. A file-by-file copy is easy to check and access (and the history also looks that way).

If the source had btrfs, I would use btrfs send/receive to speed it up and make it atomic.

I have two such backup disks in different places. One uses an automatic backup trigger during my lunch break, the other is triggered manually (and thus not often enough).

The sources are diverse (servers, laptops, ...). The most valued one uses 2 x 1 TB SSDs in RAID1 for robustness.

All disks are fully encrypted.

[+] RJIb8RBYxzAMX9u|8 years ago|reply

> If the source had btrfs, I would use btrfs send/receive to speed it up and make it atomic.

btrfs subvolume snapshot / send are not atomic, for various definitions of atomic.

Unlike zfs, subvolume snapshots are not atomic recursively. That is, if you have subvol/subsubvol, there's no way to take an atomic snapshot of both. At least this one is obvious, since there's no command for taking recursive snapshots, so it tips you off that this is the case. Not having an easy way to take recursive snapshots, atomic or not, is a different pain point...

What's more insidious is that after taking the snapshot, you must sync(1)[0] before sending said snapshot, otherwise the stream would be incomplete! I'm invoking Cunningham's Law here and saying for the record that this is fucking retarded. I have lost files due to this...design choice.

Moreover, though is is probably a fixed bug, I used to have issues where subvolumes get wedged when I run multiple subvolume snapshot / send in quick succession. I'd get a random unreadable file, and it's not corruption (btrfs scrub doesn't flag it). Usually re-mounting the subvolume will fix it, and at worst re-mounting the while filesystem would fix it so far. I haven't had it happen for a while, but it's either due to my workaround -- good old sleep(60) mutex -- or because I'm running a newer kernel.

I can't wait until xfs reflink support is more mature: that'll get me 90% what I use btrfs for.

tl;dr: btrfs: here be dragons!

[0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup

[+] b_emery|8 years ago|reply

No workflow, we just use BackupPC (http://backuppc.sourceforge.net/) - cant recommend it enough. Restores are easy, monitoring and automation on different schedules is all built in. It's really great.

[+] sliken|8 years ago|reply

Love backuppc, especially 4.X where they replace the somewhat ugly hack of using hardlinks with checksums. Recommended if you have a collection of systems, if it's just a single system I'd use rdiff-backup (if you have remote storage) or duplicity if you want to pay for remote storage (amazon, rackspace, or similar).

[+] paulborza|8 years ago|reply

BackupPC has helped me restore my RabbitMQ queues several times. Love BackupPC.

It has an ugly looking interface, but the core of the product is super reliable.

[+] Symbiote|8 years ago|reply

With rsync, using the --link-dest option to make a complete file tree using hard links to a remote server.

Cron runs it, on @reboot schedule. If the backup is successful, some (but not all) old backups are deleted. I delete some oldest preserved backups manually, if disk space runs low.

[+] arunpn123|8 years ago|reply

Most of my projects fit in my laptop hardisk. I have a dropbox subscription which syncs everything interesting in my laptop to dropbox servers. This setup has saved my work once when my old laptop died - I just bought a new laptop and synced everything from dropbox.

82 comments