(no title)
Zamicol | 1 year ago
Not according to my math:
Numeric: 1000/1024 = 98%
Alphanum: 2025/2048 = 99%
Byte: 191/256 = 75%
Kanji: 13/16 = 81%*
Alphanumeric is the most efficient QR code encoding mode.
(Just to further make this clear, for QR Byte encoding uses ISO/IEC 8859-1, where 65 characters are undefined, so 191/256, which is ~75%. If character encoding isn't an issue, than byte encoding is the most efficient, 256/256, 100%, but that's a very rare edge case. Also, last time I did the math on Kanji it was about 81% efficient. *I have not dug too deep into Kanji and there may be a way to make it more efficient than I'm aware of. I've never considered it useful for my applications so I have not looked.)
Dylan16807|1 year ago
That is a semi-correct calculation of the wrong number. Base45 does not use all 45 characters in every slot. It goes 16 bits at a time, so the character storing the upper bits only has 2^16/45^2 = 33 possible values.
The most straightforward way to measure efficiency is to see that base45 takes 32 source bits, and encodes them into 33 bits. The way you're calculating, that's only 50%
But the better way to calculate efficiency is to take the log of everything (in other words, count how many bits are needed). Numeric is log(1000)/log(1024) which is 99.7%. Alphanum is 99.9%. Base45 is 97%.
And I don't know where that kanji number came from. It stores 13 bits at a time, mapping to 8192 shift-JIS code points, and the vast majority of them are valid. It's pretty efficient.
Zamicol|1 year ago
Huh? I don't necessarily care about an exact "base45", I care about QR code alphanumeric, which just so happens to be a (generic) base 45 character set. For QR code, two characters are encoded into 11 bits.
>in every slot.
I've worked with the QR code standards pretty seriously and I am unfamiliar with the term "slots" being used by the standards. This is why I suspect your referring specifically to RFC base45 (although the term isn't used there either), which QR code doesn't care about. I also don't care about RFC Base 45 and would prefer to use a more bit space efficient method, such as using the iterative divide by radix method, which I also call "natural base conversion".
> base45 takes 32 source bits For QR code alphanumeric, 6 characters use 33 bits, not 32. way to calculate efficiency
The way we calculate this, for example, 2025/2048, we've termed "bit space efficiency". I'm not sure how commonly adopted this term is used in the rest of the industry. On the matter, I thought I had read "the iterative divide by radix algorithm" in industry, but after searching it turns out to be a term novel to our work.
This is also similar to the way Shannon originally calculated entropy and appears to be a fundamental representation of information. Of course log is useful, but it often results in partial bits or rounding, 5.5 in the case of alphanumeric, which is somewhat absurd considering that the bit is the quantum of information, again as shown by Shannon. There is no such thing as a partial bit that can be communicated, since information is fundamental to communication, so the fractional representation we've found to be more informative and easier to work with.
Granted, in all of this, when I have done the math (and I done a lot of math on this particular issue) there appeared to be some very extreme edge cases at the end result of the QR code where some arbitrary data encoded into QR numeric was slightly more efficient than alphanumeric, but overall alphanumeric was more efficient almost all the time. There are other considerations, like padding and escaping, that makes exact calculation more difficult than it's worth. I just needed to "most of the time" calculation and that's where I stopped.
For more detail of my work, my BASE45 predates the RFC by 2 years in 2019, then I published a base 45 alphabet, BASE45, by March 1, 2020, a whole year before the RFC. A patent including BASE45 was submitted June 22, 2021: https://image-ppubs.uspto.gov/dirsearch-public/print/downloa...
Matter of fact, because of the issues and confusion surrounding base conversion, I wrote this tool in 2019:
https://convert.zamicol.com
It is the first arbitrary base conversion tool on the web. It also was essential for our work with QR code and other base conversion issues.