top | item 7868401

(no title)

ssdfsdf | 11 years ago

Hmm, yes this may be true, even in the average.

I would still caution against using any old encoding technique on a string representation of the genome and using the compressed length as any sort of meaningful measure of the inherent information contained within it.

discuss

order

eli_gottlieb|11 years ago

Yes, but some compression algorithms are still nice ways to approximate, for ordering or measurement purposes, the algorithmic complexity of a string.

ssdfsdf|11 years ago

I'm not sure that is true. Take for instance the first 100 prime numbers printed one after another in a string. The string is long and apparently random, yet contains little algorithmic complexity, since the machine which prints out the numbers is fairly simple. A standard compression algorithm will not be able to compress the string very effectively.

Therefore I am not sure that compressing the string is likely to give you a sense of the information contained within it, at least information in the sense which we are interested in.