top | item 39871624

(no title)

Hello71 | 1 year ago

This sounds like a Windows problem, plus compression settings. Your wlog is 24 instead of 21, meaning decompression will use more memory. After adjusting those for a fair comparison, pack still wins slightly but not massively:

  Benchmark 1: tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
    Time (mean ± σ):      2.573 s ±  0.091 s    [User: 8.611 s, System: 1.981 s]
    Range (min … max):    2.486 s …  2.783 s    10 runs
   
  Benchmark 2: bsdtar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
    Time (mean ± σ):      3.400 s ±  0.250 s    [User: 8.436 s, System: 2.243 s]
    Range (min … max):    3.171 s …  4.050 s    10 runs
   
  Benchmark 3: busybox tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
    Time (mean ± σ):      2.535 s ±  0.125 s    [User: 8.611 s, System: 1.548 s]
    Range (min … max):    2.371 s …  2.814 s    10 runs
   
  Benchmark 4: ./pack -i ./linux-6.8.2 -w
    Time (mean ± σ):      1.998 s ±  0.105 s    [User: 5.972 s, System: 0.834 s]
    Range (min … max):    1.931 s …  2.250 s    10 runs
   
  Summary
    ./pack -i ./linux-6.8.2 -w ran
      1.27 ± 0.09 times faster than busybox tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
      1.29 ± 0.08 times faster than tar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
      1.70 ± 0.15 times faster than bsdtar -c ./linux-6.8.2 | zstd -cT0 --zstd=strat=2,wlog=24,clog=16,hlog=17,slog=1,mml=5,tlen=0 > linux-6.8.2.tar.zst
Another machine has similar results. I'm inclined to say that the difference is probably mainly related to tar saving attributes like creation and modification time while pack doesn't.

> it is done in two steps: first creating tar and then compression

Pipes (originally Unix, subsequently copied by MS-DOS) operate in parallel, not sequentially. This allows them to process arbitrarily large files on small memory without slow buffering.

discuss

order

OttoCoddo|1 year ago

Thank you for the new numbers. Sure, it can be different on different machines, especially full systems. For me on Linux and ext4, Pack finishes the Linux code base at just 0.96 s.

Anyway, I do not expect an order of magnitude difference between tar.zst and Pack; after all, Pack is using Zstandard. What makes Pack fundamentally different from tar.zst is Random Access and other important factors like user experience. I shared some numbers on it here: https://news.ycombinator.com/item?id=39803968 and you are encouraged to try them for yourself. Also, by adding Encryption and Locking to Pack, Random Access will be even more beneficial.