I appreciate seeing this article that I wrote 4 years ago resurface :) Happy to answer any questions about it or the tech behind it.
We also use a SQLite database, and in a manner similar to the OP’s article. We use it to track which content-defined chunks are at which offsets in which files on disk. We deduplicate those chunks on the CDN so you have to download less data when updating the game, but on disk we need to recreate the files as originally uploaded because that’s how games load them.
We’ve expanded the use of this tech quite a bit since the article. For example, my team has started using the patcher internally to manage our 2-million-file vendored code and binaries repo, as a replacement for Git LFS of sorts. Cloning the repo from scratch became 10 times faster, and so did switching branches, especially for remote devs, since the files are served through CloudFront.
Some of the more interesting work we’ve done recently has been optimizing the core content-defined chunking algorithm. The GearHash rolling hash algorithm from FastCDC is decently fast, but we were able to improve chunking speeds by 5x with vectorized implementations and using AES-NI instructions in a clever way to avoid the GearHash table lookups. It can do 11GB/sec per core now using AVX-512 instructions.
Another thing we did recently was build a tool for repacking Unreal Engine Pak files so game assets maintain a consistent layout across versions, similar to what garaetjjte mentioned in a comment above about UnityFS files. This reduced the disk I/O needed to update VALORANT by over 90% and cut update times by half for players. The combination of content-defined chunking with tooling to make game data files friendlier to differential patching can make a huge difference.
blaz0|2 years ago
We also use a SQLite database, and in a manner similar to the OP’s article. We use it to track which content-defined chunks are at which offsets in which files on disk. We deduplicate those chunks on the CDN so you have to download less data when updating the game, but on disk we need to recreate the files as originally uploaded because that’s how games load them.
We’ve expanded the use of this tech quite a bit since the article. For example, my team has started using the patcher internally to manage our 2-million-file vendored code and binaries repo, as a replacement for Git LFS of sorts. Cloning the repo from scratch became 10 times faster, and so did switching branches, especially for remote devs, since the files are served through CloudFront.
Some of the more interesting work we’ve done recently has been optimizing the core content-defined chunking algorithm. The GearHash rolling hash algorithm from FastCDC is decently fast, but we were able to improve chunking speeds by 5x with vectorized implementations and using AES-NI instructions in a clever way to avoid the GearHash table lookups. It can do 11GB/sec per core now using AVX-512 instructions.
Another thing we did recently was build a tool for repacking Unreal Engine Pak files so game assets maintain a consistent layout across versions, similar to what garaetjjte mentioned in a comment above about UnityFS files. This reduced the disk I/O needed to update VALORANT by over 90% and cut update times by half for players. The combination of content-defined chunking with tooling to make game data files friendlier to differential patching can make a huge difference.
GOATS-|2 years ago