I found squashfs to be a great archive format. It preserves Linux file ownership and permissions, you can extract individual files without parsing the entire archive like tar and it's mountable. It's also openable in 7zip.
I wonder how pack compares to it, but its home page and github don't tell much.
for sosreports (archives with lots of diagnostic commands and logfiles from a linux host), I wanted to find a file format that can both used zstd compression (or maybe something else that is about as fast and compressible, currently often uses xz which is very very slow) -and- that lets you unpack a single file fast, with an index, ideally so you can mount it loopback or with fuse or otherwise just quickly unpack a single file in a many-GB archive.
You'd be surprised that this basically doesn't exist right now. Theres a bunch of half solutions, but no real good easily available one. Some things add indexes to tar, zstd does support partial/offset unpacking without reading the entire archive in the code but basically no one uses that function, it's kindof silly. There are zip and rar tools with zstd support, but they are not all cross compatible and mostly doesn't exist in the packaged Linux versions.
squashfs with zstd added mostly fits the bill.
I was really surprised not to find anything else given we had this in Zip and RAR files 2 decades ago. But nothing so far that would or could ship on a standard open source system managed to modernise that featureset.
Such random access using `--include` is very fast.
As an example, if I want to extract just a .c file from the whole codebase of Linux, it can be done (on my machine) in 30 ms, compared to near 500 ms for WinRAR or 2500 ms for tar.gz. And it will just worsen when you count encryption. For now, Pack encryption is not public, but when it is, you can access a file in a locked Pack file in a matter of milliseconds rather than seconds.
I haven't had a chance to use it yet, but https://github.com/mhx/dwarfs claims to be times faster than squashfs, to compress much better, and to have full FUSE support.
WIM is the closest thing Windows has to a full file-based capture, but I've noticed that even that doesn't capture everything, unfortunately. I forget exactly, but think it was extended attributes that DISM wouldn't capture, despite the /EA flag. Not sure if that was a file format limitation or just a DISM bug.
That's exactly what I'd like to avoid. I want to transfer a group of files (either to myself, friends, or website visitors), not make assumptions about the target system's permission set. For copies of my own data where permissions are relevant, I've got a restic backup
Wake me up if a simple standard comes to pass that neither has user/group ID, mode fields, nor bit-packed two-second-precision timestamps or similar silliness. Perhaps an executable bit for systems that insist on such things for being able to use the download in the intended way
(I self-made this before: a simple length-prefixed concatenation of filename and contents fields. The problem is that people would have to download an unpacker. That's not broadly useful unless it is, as in that one case, a software distribution which they're going to run anyway)
Sometimes you want to include data and sometimes you don't for different reasons in different contexts. It's not a data handlers job to decide what data is or isn't included, it's the senders job to decide what not to include and the receivers job to decide what to ignore.
The simplest example is probably just the file path. tar or zip don't try to say whether or not a file in the container includes the full absolute path, a portion of the path, or no path.
The container should ideally be able to contain anything that any filesystem might have, or else it's not a generally useful tool, it's some annoying domain-specific specialized tool that one guy just luuuuuvs for his one use-case he thinks is obviously the most rational thing for anyone.
If you don't want to include something like a uid, say for security reasons not to disclose the internal workings of something on your end, then arrange not to include it when creating the archive, the same way you wouldn't necessarily include the full path to the same files. Or outside of a security concern like that, include all the data and let the recipient simply ignore any data that it doesn't support.
I agree very much with this. Something that annoys me is how much information tar files leak. Like, you don't need to know the username or groupname of the person that originally owned the files. You don't need to copy around any mode bit other than "executable". You definitely don't need "last modified" timestamps, which exist only to make builds that produce archives non-hermetic.
Frankly, I don't even want any of these things on my mounted filesystem either.
> The problem is that people would have to download an unpacker.
Your archive format just needs to be an executable that runs on every platform. https://github.com/jart/cosmopolitan is something that could help with that. ("Who would execute an archive? It could do anything," I hear you scream. Well, tell that to anyone who has run "curl | bash".)
> Wake me up if a simple standard comes to pass that neither has user/group ID, mode fields, nor bit-packed two-second-precision timestamps or similar silliness. Perhaps an executable bit for systems that insist on such things for being able to use the download in the intended way
Other than having timestamps isn't this a ZIP file? No user id, no x bit, widely available implementations... Not very simple though I guess.
- As far as I know, squashfs is a file system and not an archive format; the "FS" in the name shows the focus.
- It is read-only; Pack is not. Update and delete are not just public yet, as I wanted people to get the taste first.
- It is clearly focused on archiving, rather than Pack wanting to be a container option for people who want to pack some files/data and store or send them with no privacy dangers.
- Pack is designed to be user-friendly for most people; CLI is very simple to work with, and future OS integration will make working with it like a breeze. It is far different from a good file system focused on Linux.
- I did not compare to squashfs, but I will be happy to see any results from interested people.
- being read only is mostly a benefit to an archive. Back in the days when drives had been small, I occasionally wanted to update a .rar, but in the last ~5 years I can't remember a case for it.
- it's fine, but don't think that others' use cases are invalid because of your vision
As a separate note, had I encountered pack.ac link anywhere on the internet other than here with a description attached, I'd have left it immediately. It just lacks for me any info what it is and why should I try it.
lathiat|1 year ago
for sosreports (archives with lots of diagnostic commands and logfiles from a linux host), I wanted to find a file format that can both used zstd compression (or maybe something else that is about as fast and compressible, currently often uses xz which is very very slow) -and- that lets you unpack a single file fast, with an index, ideally so you can mount it loopback or with fuse or otherwise just quickly unpack a single file in a many-GB archive.
You'd be surprised that this basically doesn't exist right now. Theres a bunch of half solutions, but no real good easily available one. Some things add indexes to tar, zstd does support partial/offset unpacking without reading the entire archive in the code but basically no one uses that function, it's kindof silly. There are zip and rar tools with zstd support, but they are not all cross compatible and mostly doesn't exist in the packaged Linux versions.
squashfs with zstd added mostly fits the bill.
I was really surprised not to find anything else given we had this in Zip and RAR files 2 decades ago. But nothing so far that would or could ship on a standard open source system managed to modernise that featureset.
(If anyone has any pointers let me know :-)
OttoCoddo|1 year ago
`pack -i ./test.pack --include=/a/file.txt`
or a couple files and folders at once:
`pack -i ./test.pack --include=/a/file.txt --include=/a/folder/`
Use `--list` to get a list of all files:
`pack -i ./test.pack --list`
Such random access using `--include` is very fast. As an example, if I want to extract just a .c file from the whole codebase of Linux, it can be done (on my machine) in 30 ms, compared to near 500 ms for WinRAR or 2500 ms for tar.gz. And it will just worsen when you count encryption. For now, Pack encryption is not public, but when it is, you can access a file in a locked Pack file in a matter of milliseconds rather than seconds.
pdimitar|1 year ago
toomuchtodo|1 year ago
Example: https://alexwlchan.net/2019/working-with-large-s3-objects/
nolist_policy|1 year ago
qwerty456127|1 year ago
If only 7zip could also create them on Windows (it apparently can WIM which seems a direct Windows-native counterpart, also mountable on Linux).
dataflow|1 year ago
benibela|1 year ago
I was trying to change a single file in squashfs container recently and could not find a way to do that.
Aachen|1 year ago
Wake me up if a simple standard comes to pass that neither has user/group ID, mode fields, nor bit-packed two-second-precision timestamps or similar silliness. Perhaps an executable bit for systems that insist on such things for being able to use the download in the intended way
(I self-made this before: a simple length-prefixed concatenation of filename and contents fields. The problem is that people would have to download an unpacker. That's not broadly useful unless it is, as in that one case, a software distribution which they're going to run anyway)
Brian_K_White|1 year ago
Sometimes you want to include data and sometimes you don't for different reasons in different contexts. It's not a data handlers job to decide what data is or isn't included, it's the senders job to decide what not to include and the receivers job to decide what to ignore.
The simplest example is probably just the file path. tar or zip don't try to say whether or not a file in the container includes the full absolute path, a portion of the path, or no path.
The container should ideally be able to contain anything that any filesystem might have, or else it's not a generally useful tool, it's some annoying domain-specific specialized tool that one guy just luuuuuvs for his one use-case he thinks is obviously the most rational thing for anyone.
If you don't want to include something like a uid, say for security reasons not to disclose the internal workings of something on your end, then arrange not to include it when creating the archive, the same way you wouldn't necessarily include the full path to the same files. Or outside of a security concern like that, include all the data and let the recipient simply ignore any data that it doesn't support.
jrockway|1 year ago
Frankly, I don't even want any of these things on my mounted filesystem either.
> The problem is that people would have to download an unpacker.
Your archive format just needs to be an executable that runs on every platform. https://github.com/jart/cosmopolitan is something that could help with that. ("Who would execute an archive? It could do anything," I hear you scream. Well, tell that to anyone who has run "curl | bash".)
lmz|1 year ago
Other than having timestamps isn't this a ZIP file? No user id, no x bit, widely available implementations... Not very simple though I guess.
OttoCoddo|1 year ago
- It is read-only; Pack is not. Update and delete are not just public yet, as I wanted people to get the taste first.
- It is clearly focused on archiving, rather than Pack wanting to be a container option for people who want to pack some files/data and store or send them with no privacy dangers.
- Pack is designed to be user-friendly for most people; CLI is very simple to work with, and future OS integration will make working with it like a breeze. It is far different from a good file system focused on Linux.
- I did not compare to squashfs, but I will be happy to see any results from interested people.
My bet is on Pack, obviously, to be much faster.
dur-randir|1 year ago
- being read only is mostly a benefit to an archive. Back in the days when drives had been small, I occasionally wanted to update a .rar, but in the last ~5 years I can't remember a case for it.
- it's fine, but don't think that others' use cases are invalid because of your vision
- mount is also a CLI interface
dur-randir|1 year ago
bravetraveler|1 year ago
bonki|1 year ago
unknown|1 year ago
[deleted]