Isn't compression the best general purpose solution with theoretical guarantees? I mean a simple huffman coding could easily extract the key-values (where the repeated keys will be candidates for compression) relationship and compress it. But if you want to extract more juice, then that implies knowing the datatypes of strings themselves. It would be like types for logs and not messing those up.
No comments yet.