top | item 44804406

(no title)

DefineOutside | 6 months ago

This has been applied to minecraft region files in a fork of paper, which is a type of minecraft server.

https://github.com/UltraVanilla/paper-zstd/blob/main/patches...

from the author of this patch on discord - the level 9 for compression isn't practical and is too slow for a real production server but it does show the effectiveness of zstd with a shared dictionary.

  So you start off with a 755.2 MiB world (in this test, it is a section of an existing DEFLATE-compressed world that has been lived in for a while). If you recreate its regions it will compact it down to 695.1 MiB

  You set region-file-compression=lz4 and run --recreateRegionFiles and it turns into a 998.9 MiB world. Makes sense, worse compression ratios but less CPU is what mojang documented in the changelog. Neat, but I'm confused as to what the benefits are as I/O increasingly becomes the more constrained thing nowadays. This is just a brief detour from what I'm really trying to test

  You set region-file-compression=none and it turns into a 3583.0 MiB world. The largest region file in this sample was 57 MiB

  Now, you take this world, and compress each of the region files individually using zstd -9, so that the region files are now .mca.zst files. And you get a world that is 390.2 MiB

discuss

order

lordpipe|6 months ago

Author here -- the solution I discussed in that message isn't quite the same solution as the one linked. The `paper-zstd` repository is the one using dictionary compression on individual chunks. But in the `.mca.zst` solution I'm not using dictionaries at all. It's more like a glorified LinearPaper -- just take the region file, decompress the individual chunks, and recompress the entire 1024 chunk container together. It breaks random access to individual chunks, but it's great for archival or cloud storage offloading of infrequently visited parts of a MC world, which is what I'm using it for.

I don't remember the exact compression ratios for the dictionary solution in that repo, but it wasn't quite as impressive (IIRC around a 5% reduction compared to non-dictionary zstd at the same level). And the padding inherent to the region format takes away a lot of the ratio benefit right off the bat, though it may have worked better in conjunction with the PaperMC SectorFile proposal, which has less padding, or by rewriting the storage using some sort of LSM tree library that performs well at compactly storing blobs of varying size. I've dropped the dictionary idea for now, but it definitely could be useful. More research is needed.

masklinn|6 months ago

> You set region-file-compression=lz4 and run --recreateRegionFiles and it turns into a 998.9 MiB world. Makes sense, worse compression ratios but less CPU is what mojang documented in the changelog. Neat, but I'm confused as to what the benefits are as I/O increasingly becomes the more constrained thing nowadays. This is just a brief detour from what I'm really trying to test

Might make sense if the region files are on a fast SSD and the server is more CPU-constrained? I assume the server reads from and writes to the region files during activity, a 3.5x increase in IO throughput at very little CPU cost (both ways) is pretty attractive. IIRC at lower compression levels deflate is about an order of magnitude more expensive than lz4.

zstd --fast is also quite attractive, but I'm always confused as to what the level of parallelism is in benchmarks, as zstd is multithreaded by default and benchmarks tend to show wallclock rather than CPU seconds.

lordpipe|6 months ago

> Might make sense if the region files are on a fast SSD and the server is more CPU-constrained?

I wrote that when the feature had just come out. Now it's been a bit since Minecraft started natively supporting the LZ4 chunk compression option. It seems safe to say that this tradeoff does in fact make sense, even when the CPU is quite powerful. Several servers have adopted it and have seen decent improvements.

adgjlsfhk1|6 months ago

the great thing about zstd is it has a ton of options for encoding, but the decoder is basically the same for all of them.

immibis|6 months ago

Note that each region file contains 1024 chunks that are designed to be (but probably aren't) accessed at random, so compressing a region file is like a solid archive with a solid block size of 1024 files.