top | item 40966450

(no title)

sammex | 1 year ago

Would the index number actually be smaller than the actual data?

discuss

It would average the same size as the actual data. Treating the pi bit sequence as random bits, and ignoring overlap effects, the probability that a given n bit sequence is the one you want is 1/2^n, so you need to try on average 2^n sequences to find the one you want, so the index to find it is typically of length n, up to some second order effects having to do with expectation of a log not being the log of an expectation.

psychoslave|1 year ago

You need both index and length, I guess. If concatenating both value is not enough to gain sufficient size shrink, you can always prefix a "number of times still needed to recursively de-index (repeat,start-point-index,size) concatenated triplets", and repeat until you match a desired size or lower.

I don’t know if there would be any logical issue with this approach. The only logistical difficulty I can figure out is computing enough decimals and search the pattern in it, but I guess that such a voluminous pre-computed approximation can greatly help.

waldrews|1 year ago

No invertible function can map every non-negative integer to a lower or equal non-negative integer (no perfect compression), but you can have functions that compress everything we care about at the cost of increasing the size of things we don't care about. So the recursive de-indexing strategy has to sometimes fail and increase the cost (once you account for storing the prefix).