an LLM can be used to losslessly compress a string to a size equal to the number of bits of entropy of next token prediction loss over the string, by encoding the extra bits of entropy with arithmetic encoding. its sota compression for the distribution of string found on the internet
No comments yet.