top | item 31443308

(no title)

mota7 | 3 years ago

Yes. The formulation in the blog post is a bit messy, but possibly more general way to think about it is something like:

0. set y = 1. set x = 0.

1. How many possible choices are there _given what we've picked so far_? Call it N.

2. Pick one of them. Call it 'a'.

3. set x = x + a * y. set y = y * N.

4. Repeat steps 1-3 until the number of choices drops to be less than 2.

Decoding is the same thing in reverse. Start with x. mod it with the first N. that's the first choice. divide by N, mod with the number of remaining choices. that's the second choice. etc.

For the case of n choose k, we get the rule that we can't pick anything lower than what has been picked so far. This means that first choice is one of (n-k+1) items. (We can't pick higher than that as we'd run out of choices before we reached k).

With this construction, we can see your case is easy. You just need to be able to signal that some choice was the last choice. So the each choice is from a set of N+1, being the N possibilities plus "picked nothing". Picking nothing means there's no remaining choices so it terminates.

discuss

order

lifthrasiir|3 years ago

This is an easy formulation but not necessarily optimal. In particular the step 1 is not well defined, and any further definition results in a non-optimal encoding.

Say we have n = 6 and k = 3 and we've already seen 4; what are possible choices at this point? If you allow 3 here it would be equivalent to picking 3 first then 4 and there would be a redundant encoding. You can "fix" this by fixing the order, so without a loss of generality let's assume a strictly increasing order. Then the worst case would be (1, 2, 3), where N is respectively 6, 5 and 4, resulting in log_2 120 = 6.90 bits of information entropy instead of the optimal log_2 24 = 4.32 bits.

The real fix would involve counting each subcases. There are 10 strictly increasing sequences starting with 1, 6 starting with 2, 3 starting with 3 and one starting with 4. The step 1 should account for this non-uniform distribution at the first place. The combinatorial number system described in the OP does this automatically.

mota7|3 years ago

For the first case, you can't pick a number lower than any existing number. So if 4 is already picked, then only 5 or 6 can be picked, but if we pick 6, then there's no third number available to pick, so there only 1 possibility (picking 5). So it terminates.

The worst case of (1,2,3) is respectively 4, 3 and 2, resulting in log_2 24 = 4.32 bits.

The first choice isn't picking from 6, it's picking from 4. (because if 5 or 6 are chosen, then there's not enough following numbers to be able to choose 2 more numbers). That's the piece about '(n-k+1) choices for the first number'. 5 or 6 are not possible choices given the ordering requirement.

xeyownt|3 years ago

Great! Very clear. This should be the TL;DR... or the dense formulation of the dense encoding ;-)