The linked GBC version is my fork with some improvements (and more in the works).
The current published release uses a similar compression approach by zeta_two, but in current builds I've switched to the compression by arpruss since total data + decompression code size is now a couple hundred bytes smaller.
Not sure how big the word dict is in your latest version, but you can do much better simply by reordering how you create your index.
With alphabet in order, assembling letters ABCDE: 17345.00 bytes
With alphabet in order, assembling letters EDCBA: 16949.00 bytes
With alphabet order tweaked, assembling letters EDCBA: 16309.00 bytes
Where tweaked means you build your offset as if each position was ordered like this ([::-1] means reverse if you're unfamiliar with Python).
You can also use a prefix rather than variable length encoding, this means you can use 2 bits to represent a number bigger than 2^14, rather than 3. This might hurt your ability to decode though, as you'll have bits that cross byte boundaries.
You can get much smaller using length 3 varints rather than 7 (13,110 bytes), but I presume that would perform worse on GB hardware than staying byte aligned.
Not sure what technique you're using for the answers list, but the compress5.py suggests it's doing a basic bitmap.
Base bitmap is 12972 bits, or 1622 bytes (your file lists 1619, not sure why it's 3 bytes smaller, but all the same). You can "skip encode" (I don't know the formal name for this technique) into 1232 bytes by encoding runs of three [0, 0, 0] as [0], and anything else as [1, X, X, X], saving another 390 bytes.
I tried all combinations of runs between 1 and 7, and 3 is optimal.
bbbbbr|4 years ago
The current published release uses a similar compression approach by zeta_two, but in current builds I've switched to the compression by arpruss since total data + decompression code size is now a couple hundred bytes smaller.
I did some profiling and code size measurements before switching over. https://github.com/bbbbbr/gb-wordle/blob/compress_arpruss/wo...
Speed (and code size somewhat) have improved more since then.
quicktwo|4 years ago
With alphabet in order, assembling letters ABCDE: 17345.00 bytes With alphabet in order, assembling letters EDCBA: 16949.00 bytes With alphabet order tweaked, assembling letters EDCBA: 16309.00 bytes
Where tweaked means you build your offset as if each position was ordered like this ([::-1] means reverse if you're unfamiliar with Python).
``` alpha1 = "abcdestfghijklmnopqruvwxyz"[::-1] alpha2 = "eaioustrbcdfghjklmnpqvwxyz" alpha3 = "aeioustrbcdfghjklmnpqvwxyz" alpha4 = "eaiousthrbcdfgjklmnpqvwxyz" alpha5 = "aeioustryhkbcdfgjlmnpqvwxz" ```
You can also use a prefix rather than variable length encoding, this means you can use 2 bits to represent a number bigger than 2^14, rather than 3. This might hurt your ability to decode though, as you'll have bits that cross byte boundaries.
You can get much smaller using length 3 varints rather than 7 (13,110 bytes), but I presume that would perform worse on GB hardware than staying byte aligned.quicktwo|4 years ago
Base bitmap is 12972 bits, or 1622 bytes (your file lists 1619, not sure why it's 3 bytes smaller, but all the same). You can "skip encode" (I don't know the formal name for this technique) into 1232 bytes by encoding runs of three [0, 0, 0] as [0], and anything else as [1, X, X, X], saving another 390 bytes.
I tried all combinations of runs between 1 and 7, and 3 is optimal.