The first two issues -- a lack of index and the fact that you can't seek within a deflated tarball -- are true but are easily handled by smarter compression. Tarsnap, for example, splits off archive headers and stores them separately in order to speed up archive scanning.
The third issue -- lack of support for modern filesystem features -- is just plain wrong. Sure, the tar in 7th edition UNIX didn't support these, but modern tars support modern filesystem features.
The fourth issue -- general cruft -- is correct but irrelevant on modern tars since the problems caused by the cruft are eliminated via pax extension headers.
The guy immediately loses credibility in my eyes for referring to the most popular archive format as 'WinZip'. It's the ZIP file format, designed by Phil Katz of PKWare Inc.
What this article describes has already been solved with zip, gzip, 7z, bzip and forks of tar
The problem is that at the moment there is no open standard (there are IETF proposals) since each of these is either patent, copyright or trademark encumbered.
> Because tar does not support encryption/compression on the inside of archives.
Yes it does? Just encrypt/compress all the files before tarring.
> Not indexed
The reason tar doesn't have an index is so that tarballs can be concatenated. Also IIRC, you only have to jump through the headers for all files. Still O(n) where n is the number of files, but you don't have to scan through all of the data.
> The reason tar doesn't have an index is so that tarballs can be concatenated.
I'm curious, what's the use-case for this? Offhand, the only use for that ability I can think of is if I forgot a file in a tarball and have already deleted the originals; I can tar the missing file and cat the two tarballs.
Compress before tarring is a really dumb idea and you will get terrible compression ratios - you cannot exploit data patterns across files. It could work if you ask gzip to write some sort of a global table...
I think raising these concerns is fair in a world where nearly all Unix-related source code and binaries is distributed in (g/bzipped) TAR format. Unfortunately, the author does not really explain why this is and what is wrong with ZIP (e.g. why a new format is needed).
TAR is old however, and if ZIP cannot take its place, coming up with something new is not such a bad idea. I think Apple's DMG/UDIF file format deserves to be mentioned as well: it addresses all the concerns mentioned (it is essentially a mountable filesystem). I'm pretty sure there is a lot to be learned from that.
"... Because tar does not support encryption/compression on the inside of archives ..."
That can be an advantage. Space isn't always what I want for backups - I want the original data back and compression gone wrong (tar -zxvf) is just another way to loose data.
That is exactly why the lackof in-archive compression is bad, with tar you lose tje whole rest of the archive on a single bit error, with in-archive compression you lose just the file the error is located in.
The pkzip format allows you to "zip" data uncompressed if you are worried about that. Then you can trivially unpack your files using nothing but seek and read for those cases where you also accidentally misplace your last copy of unzip.
[+] [-] cperciva|15 years ago|reply
The third issue -- lack of support for modern filesystem features -- is just plain wrong. Sure, the tar in 7th edition UNIX didn't support these, but modern tars support modern filesystem features.
The fourth issue -- general cruft -- is correct but irrelevant on modern tars since the problems caused by the cruft are eliminated via pax extension headers.
[+] [-] enneff|15 years ago|reply
The guy immediately loses credibility in my eyes for referring to the most popular archive format as 'WinZip'. It's the ZIP file format, designed by Phil Katz of PKWare Inc.
http://en.wikipedia.org/wiki/ZIP_(file_format)
To add injury to insult, the rest of his proposal is pretty similar to ZIP, which also accomplishes the nice-to-have things he mentions at the end.
[+] [-] bl4k|15 years ago|reply
The problem is that at the moment there is no open standard (there are IETF proposals) since each of these is either patent, copyright or trademark encumbered.
[+] [-] nailer|15 years ago|reply
* GNU tar?
* BSD tar?
* Solaris tar?
Or even Schilly's 'star' program?
Each of these has different limits, advantages, and disadvantages.
[+] [-] rarrrrrr|15 years ago|reply
[+] [-] anon_d|15 years ago|reply
Yes it does? Just encrypt/compress all the files before tarring.
> Not indexed
The reason tar doesn't have an index is so that tarballs can be concatenated. Also IIRC, you only have to jump through the headers for all files. Still O(n) where n is the number of files, but you don't have to scan through all of the data.
[+] [-] gwern|15 years ago|reply
I'm curious, what's the use-case for this? Offhand, the only use for that ability I can think of is if I forgot a file in a tarball and have already deleted the originals; I can tar the missing file and cat the two tarballs.
[+] [-] cybernytrix|15 years ago|reply
[+] [-] micheljansen|15 years ago|reply
I guess that one of the reasons for TAR's dominance is the lack of a free alternative? Apparently ZIP is not free enough (as I understand from http://en.wikipedia.org/wiki/ZIP_(file_format)#Standardizati...).
TAR is old however, and if ZIP cannot take its place, coming up with something new is not such a bad idea. I think Apple's DMG/UDIF file format deserves to be mentioned as well: it addresses all the concerns mentioned (it is essentially a mountable filesystem). I'm pretty sure there is a lot to be learned from that.
[+] [-] farmer_ted|15 years ago|reply
<http://code.google.com/p/xar/wiki/xarformat>; <http://code.google.com/p/xar/wiki/whyxar>;
But not with the nice descriptive graphics found in the new archive format proposal.
[+] [-] bootload|15 years ago|reply
That can be an advantage. Space isn't always what I want for backups - I want the original data back and compression gone wrong (tar -zxvf) is just another way to loose data.
[+] [-] fhars|15 years ago|reply
[+] [-] gaius|15 years ago|reply
http://en.wikipedia.org/wiki/Linear_Tape-Open#Compression
[+] [-] dagw|15 years ago|reply
[+] [-] masklinn|15 years ago|reply
Most if not all "compression" formats (and software) offer a "store" compressor which stores the data as-is, without applying any compression filter.
[+] [-] nanairo|15 years ago|reply
[+] [-] hernan7|15 years ago|reply
[+] [-] joey_bananas|15 years ago|reply