top | item 5592234

(no title)

lwat | 13 years ago

The way I make sense of this is that you need fewer (slow) disk reads to get the same amount of data into RAM, so that might explain the speedup?

I agree that it sounds too good to be true though.

discuss

order

rosser|13 years ago

Your read is correct. Once CPU time spent in decompression became less than disk wait time for the same data uncompressed, the reduced IO with compression started to win — sometimes massively. As powerful as processors are these days, results like these aren't impossible, or even terribly unlikely.

Consider the analogous (if simplified) case of logfile parsing, from my production syslog environment, with full query logging enabled:

  # ls -lrt
  ...
  -rw------- 1 root root  828096521 Apr 22 04:07 postgresql-query.log-20130421.gz
  -rw------- 1 root root 8817070769 Apr 22 04:09 postgresql-query.log-20130422
  # time zgrep -c duration postgresql-query.log-20130421.gz
  19130676

  real	0m43.818s
  user	0m44.060s
  sys	0m6.874s
  # time grep -c duration postgresql-query.log-20130422
  18634420

  real	4m7.008s
  user	0m9.826s
  sys	0m3.843s
EDIT: I'm not sure why time(1) is reporting more "user" time than "real" time in the compressed case.

caf|13 years ago

zgrep runs grep and gzip as two separate subprocesses, so if you have multiple CPUs then the entire job can accumulate more CPU time than wallclock time (so it's just showing you that you exploited some parallelism, with grep and gzip running simultaneously for part of the time).

tracker1|13 years ago

I had an original IBM PC XT (used) with a 10MB full height (2x today's 5.25") MFM hard drive.. it had about 3MB of available disk space and took I swear 6+ minutes to boot.

It actually ran faster double-spaced (stacker) and had nearly 12MB of available space... didn't have any problems with programs loading, surprisingly enough.. which became more of an issue when moving onto a 486.

Yeah, when your storage is so relatively slow, the CPU can run compression, you can get impressive gains in space and performance.