They mention in the article that some people don't want to install the full Plakar backup software just to read and write ptar archives; so a dedicated open-source tool is offered for download as of yesterday:
Another similar archive format is WIM, the thing created by Microsoft for the Windows Vista (and newer) installer; an open source implementation is at: https://wimlib.net/
It offers similar deduplication, indexing, per-file compression, and versioning advantages
>By contrast, S3 buckets are rarely backed up (a rather short-sighted approach for mission-critical cloud data), and even one-off archives are rarely done.
This is a complete aside, but how often are people backing up data to something other than S3? What I mean is it some piece of data is on S3, do people have a contingency for "S3 failing".
S3 is so durable in my mind now that I really only imagine having an "S3 backup" if (1) I had an existing system (e.g. tapes), or (2) I need multi-cloud redundancy. Other than that, once I assume something is in S3, I confident it's safe.
Obviously this was built over years (decades?) or reliability, and if your DRP requires alternatives, you should do them, but is anyone realistically paranoid about S3?
Perhaps reframe the problem not as data loss because S3's technical infrastructure failed but because of one of the many other ways that data can get zapped or that you might need it. For example:
- Employee goes rogue and nukes buckets.
- Code fault quietly deletes data, or doesnt store it like you thought.
- State entity demands access to data, and you'd rather give them a tape than your S3 keys.
I agree that with eleven-nines or whatever it is of availability, a write to S3 is not going to disappoint you, but most data losses are more about policy and personnel than infrastructure failures.
Backups don't just protect you from durability issues. They protect you from accidental deletion, malware, and even just snapshots of what something looked at a particular time etc.
The context that this article suggests is that if your S3 bucket is your primary storage, then it's possible that you're not thinking about where the second copy of your data should belong.
Yes, I am paranoid of S3. Not only could a once in a lifetime event happen, an attacker could get in and delete all my data. Data could be accidentally deleted. Corrupted data could be written...
I’ve worked on a project with strict legal record-keeping requirements that had a plan for the primary AWS region literally getting nuked. But that was the only contingency in our book of plans that really required the S3 backup. We generally assumed that as long as the region still existed, S3 still had everything we put in it.
Of course, since we had the backups, restoration of individual objects would’ve been possible, but we would’ve needed to do it by hand.
AWS is an incredible company and S3 a best in class service. Blindly trust my business to their SLA? To every thing with write access to data? Hell, no.
Are people really using gzip in 2025 for new projects?
Zstd has been widely available for a long time. Debian, which is pretty conservative with new software, has shipped zstd since at least stretch (released 2017).
Having the entire backup as a single file is interesting, but does it matter?
Restic has a similar featureset (deduplicated encrypted backups), but almost certainly has better incremental performance for complex use cases like storing X daily backups, Y weekly backups, etc. At the same time, it struggles with RAM usage when handling even 1TB of data, and presumably ptar has better scaling at that size.
Yes Plakar works much like Restic and Kopia: it takes content-addressed, encrypted and deduplicated snapshots and offers efficient incremental backups via a simple CLI. Under the hood, its Kloset engine splits data into encrypted, compressed chunks. Plakar main strengths:
UI: In addition to a simple Unix-style CLI, Plakar provides an web interface and API for monitoring, browsing snapshots
Data-agnostic snapshots: Plakar’s Kloset engine captures any structured data—filesystems, databases, applications—not just files, by organizing them into self-describing snapshots
Source/target decoupling: You can back up from one system (e.g. a local filesystem) and restore to another (e.g. an S3 bucket) using pluggable source and target connectors
Universal storage backends: Storage connectors let you persist encrypted, compressed chunks to local filesystems, SFTP servers or S3-compatible object stores (and more)—all via a unified interface
Extreme scale with low RAM: A virtual filesystem with lazy loading and backpressure-aware parallelism keeps memory use minimal, even on very large datasets
Network- and egress-optimized: Advanced client-side deduplication and compression dramatically cut storage and network transfer costs—ideal for inter-cloud or cross-provider migrations
Online maintenance: you don't need to stop you backup to free some space
tux1968|7 months ago
https://plakar.io/posts/2025-07-07/kapsul-a-tool-to-create-a...
throwaway127482|7 months ago
winrid|7 months ago
msgodel|7 months ago
chungy|7 months ago
It offers similar deduplication, indexing, per-file compression, and versioning advantages
mrflop|7 months ago
nemothekid|7 months ago
This is a complete aside, but how often are people backing up data to something other than S3? What I mean is it some piece of data is on S3, do people have a contingency for "S3 failing".
S3 is so durable in my mind now that I really only imagine having an "S3 backup" if (1) I had an existing system (e.g. tapes), or (2) I need multi-cloud redundancy. Other than that, once I assume something is in S3, I confident it's safe.
Obviously this was built over years (decades?) or reliability, and if your DRP requires alternatives, you should do them, but is anyone realistically paranoid about S3?
kjellsbells|7 months ago
- Employee goes rogue and nukes buckets.
- Code fault quietly deletes data, or doesnt store it like you thought.
- State entity demands access to data, and you'd rather give them a tape than your S3 keys.
I agree that with eleven-nines or whatever it is of availability, a write to S3 is not going to disappoint you, but most data losses are more about policy and personnel than infrastructure failures.
joshka|7 months ago
The context that this article suggests is that if your S3 bucket is your primary storage, then it's possible that you're not thinking about where the second copy of your data should belong.
tecleandor|7 months ago
What if somebody deletes the file? What if it got corrupted for a problem in one of your processes? What if your API key falls in the wrong hands?
SteveNuts|7 months ago
hxtk|7 months ago
Of course, since we had the backups, restoration of individual objects would’ve been possible, but we would’ve needed to do it by hand.
jamesfinlayson|7 months ago
The backups themselves were off-limits to regular employees though - only the team that managed AWS could edit or delete the backups.
Spooky23|7 months ago
zzo38computer|7 months ago
treve|7 months ago
firesteelrain|7 months ago
ac29|7 months ago
Zstd has been widely available for a long time. Debian, which is pretty conservative with new software, has shipped zstd since at least stretch (released 2017).
kazinator|7 months ago
- tiny code size; - widely used standard; - fast compression and decompression.
And it also beat Zstandard on compressing TXR Lisp .tlo files by a non-negligible margin. I can reproduce that today:
The .gzip file is 0.944 as large as the .zstd file.So for this use case, gzip is faster (zstd has only decompression that is fast), compresses better and has way smaller code footprint.
gcr|7 months ago
Zpaq is quite mature and also handles deduplication, versioning, etc.
jauntywundrkind|7 months ago
Or eStargz. https://github.com/containerd/stargz-snapshotter
Or Nydus RAFS. https://github.com/dragonflyoss/nydus
Links for your mentioned zpaq and dwarFS https://www.mattmahoney.net/dc/zpaq.html https://github.com/mhx/dwarfs
Scaevolus|7 months ago
Restic has a similar featureset (deduplicated encrypted backups), but almost certainly has better incremental performance for complex use cases like storing X daily backups, Y weekly backups, etc. At the same time, it struggles with RAM usage when handling even 1TB of data, and presumably ptar has better scaling at that size.
mkroman|7 months ago
There's also rustic, which supposedly is optimized for memory: https://rustic.cli.rs/docs/
ahofmann|7 months ago
mrflop|7 months ago
UI: In addition to a simple Unix-style CLI, Plakar provides an web interface and API for monitoring, browsing snapshots
Data-agnostic snapshots: Plakar’s Kloset engine captures any structured data—filesystems, databases, applications—not just files, by organizing them into self-describing snapshots
Source/target decoupling: You can back up from one system (e.g. a local filesystem) and restore to another (e.g. an S3 bucket) using pluggable source and target connectors
Universal storage backends: Storage connectors let you persist encrypted, compressed chunks to local filesystems, SFTP servers or S3-compatible object stores (and more)—all via a unified interface
Extreme scale with low RAM: A virtual filesystem with lazy loading and backpressure-aware parallelism keeps memory use minimal, even on very large datasets
Network- and egress-optimized: Advanced client-side deduplication and compression dramatically cut storage and network transfer costs—ideal for inter-cloud or cross-provider migrations
Online maintenance: you don't need to stop you backup to free some space
ptar...
throwaway127482|7 months ago
mrflop|7 months ago