I have a hand-coded backup system for my photo library that writes to S3. It runs every night at 2AM.
The one feature I have that's important to me is this: it will figure out what files need to be uploaded and then upload as many as possible for an hour then stop.
That means that it runs for at-most an hour a night.
The reason I need/wanted this feature is that I might come home from a trip with (eg) 30G worth of photos. My (cable) internet will upload at around 1G an hour. I don't want this thing to saturate my internet for 30 hours straight. Instead, it backs up a small amount every night for 30 days.
Am I the only one that wants a feature like this? I've never seen it in any other backup system. (At alternative might be to have configurable bandwidth for uploads.)
Makes perfect sense. Restic kind-of supports this because you can just kill the client after an hour and, tomorrow, it'll see which objects are there already.
I'm not deep enough into the project to know whether this is like an officially supported use-case, but restic was of course made with the idea that interruptions can happen (your computer can crash) and should be handled safely, and for the deduplication it'll cut files up in a deterministic way and thus (as I understand it) store those chunks in a deterministic place.
Rclone will do exactly what you want, upload to S3 and the --max-duration will stop new transfer from starting after a given duration.
There are also throttle options for bandwidth. I use that combined with Node-Red and a smart plug on my monitors, if monitor power draw exceeds a threshold then the upload throttle is changed via the rclone API.
My internet upload speed is bad so I do want something like that.
I would also like to be able to "stage" a backup: figure out what needs to be transmitted and then create the data files that need to be transmitted without actually immediately transmitting it.
That would allow me to do things like backup my laptop to another computer in my house that can upload the files over my slow connection overnight when my laptop isn't on; and to let me bring the backup files to a place (work/university/library) with a fast connection so large backups don't take days or weeks (especially initial backup).
Unless it's for experimenting, I've stopped caring for backup solutions other than borg and zfs as the only way to prove their stability is to have them exist for a while without big complaints and new ones all seem to have complaints.
Just having no data loss isn't enough which is the absolute base point but huge memory consumption and other operational issues are also showstoppers.
Restic in my experience has been rock solid. I actually switched from Borg. Borg’s crypto has known limitations; its Python error messages are long and messy; it complained more frequently.
Restic’s repository format is simple and well documented, which is important for long term data recovery (and fixes in case changes occur in the repo). The crypto is from a good source, and well regarded. Multithreaded, fast, nice and clean output.
ZFS is a file system, and has serious limitation when used as a backup tool. It needs a ZFS backend, ruling out almost any provider (basically self host your ZFS system, which is costly and error prone). It needs more RAM than Borg and restic. And I personally feel uncomfortable with native encryption in ZFS til some time. Lower level system encryption is probably not what you want in backups.
One feature I miss from these tools (other than ZFS): error correction. They could use a Reed Solomon code or similar and add parities in case there is an accidental change in the repository.
Restic and BorgBackup really seem to be the favored solutions out there. Restic for encryption, Borg for deduplication and compression. Or maybe bacula if you want pull based backups instead of push based.
'borg'[1] has, in recent years, become the de facto standard for secure, encrypted, you-control-the-keys backups. It has been referred to as "the holy grail of backups"[2].
Two of the better howtos that we have seen for borg are [3][4]. [4] is geared toward OpenBSD users.
There is also https://github.com/restic/others which has some keywords (e.g. is it encrypted, does it do compression) for most FOSS backup solutions. It can be outdated or incomplete for some entries, though.
Local directory
sftp server (via SSH)
HTTP REST server (protocol, rest-server)
Amazon S3 (either from Amazon or using the Minio server)
OpenStack Swift
BackBlaze B2
Microsoft Azure Blob Storage
Google Cloud Storage
And many other services via the rclone Backend
I am using restic and thinking about switching to Kopia... Mainly because Kopia has compression and seems to have more activity in development. It also has gui. And from what i've seen is faster.
This point hides a lot of goodness in something that I didn't even understand on the first read:
> - We have added checksums for various backends so data uploaded to a backend can be checked there.
All data is already stored in files with as filename the sha256sum of the contents, so clearly it's all already checksummed and can be verified right?
Looking into the changelog entry[1], this is about verifying the integrity upon uploading:
> The verification works by informing the backend about the expected hash of the uploaded file. The backend then verifies the upload and thereby rules out any data corruption during upload. \n\n [...] besides integrity checking for uploads [this] also means that restic can now be used to store backups in S3 buckets which have Object Lock enabled.
Object lock is mentioned in passing somewhere down the changelog, but it's a big feature. S3 docs:
> Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
i.e. ransomware protection. Good luck wiping backups if your file hoster refuses to overwrite or delete the files. And you know Amazon didn't mess with the files because they're authenticated.
Extortion is still a thing, but if people would use this, it more-or-less wipes out the attack vector of ransomware. The only risk is if the attacker is in your systems long enough to outlast your retention period and creates useless backups in the meantime so you're not tipped off. Did anyone say "test your backups"?
For self-hosting, restic has a custom back-end called rest-server[2] which supports a so-called "append-only mode" (no overwriting or deleting). I worked on the docs for this[3] together with rawtaz and MichaelEischer to make this more secure, because eventually, of course, your disks are full or you want to stop paying for outdated snapshots on S3, and an attacker could have added dummy backups to fool your automatic removal script into thinking it needs to leave only the dummy backups. Using the right retention options, this attack cannot happen.
Others are doing some pretty cool stuff in the backup sphere as well, e.g. bupstash[4] has public key encryption so you don't need to have the decryption keys as a backup client.
[+] [-] timmg|4 years ago|reply
The one feature I have that's important to me is this: it will figure out what files need to be uploaded and then upload as many as possible for an hour then stop.
That means that it runs for at-most an hour a night.
The reason I need/wanted this feature is that I might come home from a trip with (eg) 30G worth of photos. My (cable) internet will upload at around 1G an hour. I don't want this thing to saturate my internet for 30 hours straight. Instead, it backs up a small amount every night for 30 days.
Am I the only one that wants a feature like this? I've never seen it in any other backup system. (At alternative might be to have configurable bandwidth for uploads.)
[+] [-] lykron|4 years ago|reply
`timeout -s SIGINT 1h restic...`
That would let restic run for one hour, and then once the hour elapses send a SIGINT which will stop the process (see https://github.com/restic/restic/blob/a29777f46794ea4e35548f...)
[+] [-] lucgommans|4 years ago|reply
I'm not deep enough into the project to know whether this is like an officially supported use-case, but restic was of course made with the idea that interruptions can happen (your computer can crash) and should be handled safely, and for the deduplication it'll cut files up in a deterministic way and thus (as I understand it) store those chunks in a deterministic place.
[+] [-] seized|4 years ago|reply
There are also throttle options for bandwidth. I use that combined with Node-Red and a smart plug on my monitors, if monitor power draw exceeds a threshold then the upload throttle is changed via the rclone API.
[+] [-] teaudel|4 years ago|reply
I would also like to be able to "stage" a backup: figure out what needs to be transmitted and then create the data files that need to be transmitted without actually immediately transmitting it.
That would allow me to do things like backup my laptop to another computer in my house that can upload the files over my slow connection overnight when my laptop isn't on; and to let me bring the backup files to a place (work/university/library) with a fast connection so large backups don't take days or weeks (especially initial backup).
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] eliaspro|4 years ago|reply
[1] https://www.freedesktop.org/software/systemd/man/systemd.ser...
[+] [-] crecker|4 years ago|reply
[+] [-] sydney6|4 years ago|reply
[+] [-] mekster|4 years ago|reply
Just having no data loss isn't enough which is the absolute base point but huge memory consumption and other operational issues are also showstoppers.
[+] [-] aborsy|4 years ago|reply
Restic’s repository format is simple and well documented, which is important for long term data recovery (and fixes in case changes occur in the repo). The crypto is from a good source, and well regarded. Multithreaded, fast, nice and clean output.
ZFS is a file system, and has serious limitation when used as a backup tool. It needs a ZFS backend, ruling out almost any provider (basically self host your ZFS system, which is costly and error prone). It needs more RAM than Borg and restic. And I personally feel uncomfortable with native encryption in ZFS til some time. Lower level system encryption is probably not what you want in backups.
One feature I miss from these tools (other than ZFS): error correction. They could use a Reed Solomon code or similar and add parities in case there is an accidental change in the repository.
[+] [-] cma|4 years ago|reply
[+] [-] crecker|4 years ago|reply
[+] [-] CameronNemo|4 years ago|reply
https://restic.readthedocs.io/en/stable/
https://www.borgbackup.org/
https://www.bacula.org/documentation/documentation/
[+] [-] rsync|4 years ago|reply
Two of the better howtos that we have seen for borg are [3][4]. [4] is geared toward OpenBSD users.
[1] https://borgbackup.readthedocs.io/en/stable/
[2] https://www.stavros.io/posts/holy-grail-backups/
[3] https://jstaf.github.io/2018/03/12/backups-with-borg-rsync.h...
[4] https://rgz.ee/borg.html
[+] [-] lucgommans|4 years ago|reply
[+] [-] O_H_E|4 years ago|reply
[+] [-] tekronis|4 years ago|reply
[+] [-] wmf|4 years ago|reply
[+] [-] spasche|4 years ago|reply
[+] [-] Mo3|4 years ago|reply
I'm a little unclear on one thing, are alternative S3 providers supported?
[+] [-] tome|4 years ago|reply
[+] [-] bccdee|4 years ago|reply
[+] [-] soheilpro|4 years ago|reply
[+] [-] paraph1n|4 years ago|reply
[+] [-] aborsy|4 years ago|reply
[+] [-] Tijdreiziger|4 years ago|reply
[+] [-] rvieira|4 years ago|reply
[+] [-] omnimus|4 years ago|reply
[+] [-] lucgommans|4 years ago|reply
> - We have added checksums for various backends so data uploaded to a backend can be checked there.
All data is already stored in files with as filename the sha256sum of the contents, so clearly it's all already checksummed and can be verified right?
Looking into the changelog entry[1], this is about verifying the integrity upon uploading:
> The verification works by informing the backend about the expected hash of the uploaded file. The backend then verifies the upload and thereby rules out any data corruption during upload. \n\n [...] besides integrity checking for uploads [this] also means that restic can now be used to store backups in S3 buckets which have Object Lock enabled.
Object lock is mentioned in passing somewhere down the changelog, but it's a big feature. S3 docs:
> Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
i.e. ransomware protection. Good luck wiping backups if your file hoster refuses to overwrite or delete the files. And you know Amazon didn't mess with the files because they're authenticated.
Extortion is still a thing, but if people would use this, it more-or-less wipes out the attack vector of ransomware. The only risk is if the attacker is in your systems long enough to outlast your retention period and creates useless backups in the meantime so you're not tipped off. Did anyone say "test your backups"?
For self-hosting, restic has a custom back-end called rest-server[2] which supports a so-called "append-only mode" (no overwriting or deleting). I worked on the docs for this[3] together with rawtaz and MichaelEischer to make this more secure, because eventually, of course, your disks are full or you want to stop paying for outdated snapshots on S3, and an attacker could have added dummy backups to fool your automatic removal script into thinking it needs to leave only the dummy backups. Using the right retention options, this attack cannot happen.
Others are doing some pretty cool stuff in the backup sphere as well, e.g. bupstash[4] has public key encryption so you don't need to have the decryption keys as a backup client.
[1] https://github.com/restic/restic/releases/v0.13.0
[2] https://github.com/restic/rest-server/
[3] https://restic.readthedocs.io/en/latest/060_forget.html#secu...
[4] https://github.com/andrewchambers/bupstash/