top | item 47212772

Show HN: I used LLMs to build a compression tool that beats xz on x86_64 ELFs

4 points| mohsen1 | 11 hours ago

I wanted to see if AI (mostly ChatGPT Pro and Gemini Pro 3.1) could figure out how to compress executable binaries better than existing generic tools without me actually knowing much about compression engineering or ELF internals.

The result is an experiment called fesh. It works strictly as a deterministic pre-processor pipeline wrapping LZMA (xz). The AI kept identifying "structural entropy boundaries" and instructed me to extract near-branches, normalize jump tables, rewrite .eh_frame DWARF pointers to absolute image bases, delta-encode ELF .rela structs with ZigZag mappings, and force column transpositions before compressing them in separated LZMA channels.

Surprisingly, it actually works. The CI strictly verifies that compression is perfectly reversible (bit-for-bit identity match) across 103 Alpine Linux x86_64 packages. According to the benchmarks, it consistently produces smaller payloads than xz -9e --x86 (XZ BCJ), ZSTD, and Brotli across the board—averaging around 6% smaller than maximum XZ BCJ limits.

I honestly have no idea how much of this is genuinely novel versus standard practices in extreme binary packing (like Crinkler/UPX).

Repo: https://github.com/mohsen1/fesh

Does this architecture have any actual merits for standard distribution formats, or is this just overfitting the LZMA dictionary to Alpine's compiler outputs? I'd love to hear from people who actually understand compression math.

3 comments

order

dvjakhar|11 hours ago

I don't know much about compression, but this is nothing less than a breakthrough.

fsrc|11 hours ago

Cool! I was just learning about xz yesterday =)

mohsen1|11 hours ago

Same! I thought maybe there are some low hanging fruits that AI could pick up so I prompted away!