For some reason, I thought this is about the update step in games that happens once per 'tick', that is, the physics engine loop. It's about lossless compression and downloading update packages though. That's also fine with me.
Interesting, but I'd like more details on what's happening at the client.
Take Steam for example. For some games, downloading the update takes seconds, but calculating diffs and extracting takes 10-20 minutes. That's great for Valve, because little bandwidth is used, but terrible at the client side. On top of that, the update process slows the rest of the system almost to a halt, because of all the hard drive activity.
I can potentially see this mechanism making the same mistake.
> On top of that, the update process slows the rest of the system almost to a halt, because of all the hard drive activity
As far as I'm aware, that's a problem only on Linux, because Windows has a desktop-grade IO scheduler tuned to interactive usage (whereas in Linux both the CPU and IO schedulers are written for maximum throughput).
Blizzard blew it from the moment they started downloading/syncing their games via BitTorrent - using their customers bandwidth to support games that they paid for.
Good article. As a note, I love how he uses hand drawn diagrams. I have yet to find any tool that allow me to draw diagrams as fast as I can do it on a piece of paper.
One thing that seemed glossed over, so I'm not sure if it's obvious for their use case, is the trade-off between compress once, distribute many times.
When looking at how long it takes to compress vs transmit, the optimisation was done to make the sum of both as small as possible - minimise(time(compress) + time(transmit)).
Instead it seems like you want to do is - minimise(time(compress) + expected_transmissions * time(transmit))
For any reasonable number of distributed copies of a game, it seems like this time to transmit will quickly come to dominate the total time involved.
I suspect, however, that the time to compress grows extremely quickly, for not much gain in compression, so the potential improvement is probably tiny even if you expect to be transmitting to millions of clients.
The rsync example confuses me a little bit. If you add a single bit to the front, then all the bytes are shifted into different blocks and nearly none will hash to match. But if you add a single bit, rsync still performs well. Can someone explain why that difference from the explanation?
The problem also applies to the binary delta. Adding a prefix will shift everything forward causing a diff in everything.
Bsdiff solves this with the suffix sorting. But what does rsync do? Or am I just wrong that rsync still works well? In either case, I think the offset problem makes for a more interesting motivating example for bsdiff.
The rsync algorithm divides the file into fixed size blocks only on the sending side, then calculates checksums for all blocks. On the receiving side, it tries to match them at all offsets, not just multiples of the block size.
Thus, in your example, the first (and possibly the last) block won't be found, but all other blocks will be found, shifted by an offset of 1.
[+] [-] shmerl|9 years ago|reply
I wish GOG would also open up their client and release it cross platform. Or at least document their protocol, as they promised.
[+] [-] Kenji|9 years ago|reply
[+] [-] Cursuviam|9 years ago|reply
[+] [-] yAnonymous|9 years ago|reply
Take Steam for example. For some games, downloading the update takes seconds, but calculating diffs and extracting takes 10-20 minutes. That's great for Valve, because little bandwidth is used, but terrible at the client side. On top of that, the update process slows the rest of the system almost to a halt, because of all the hard drive activity.
I can potentially see this mechanism making the same mistake.
[+] [-] Asooka|9 years ago|reply
As far as I'm aware, that's a problem only on Linux, because Windows has a desktop-grade IO scheduler tuned to interactive usage (whereas in Linux both the CPU and IO schedulers are written for maximum throughput).
[+] [-] richardwhiuk|9 years ago|reply
[+] [-] hvidgaard|9 years ago|reply
[+] [-] ZeroClickOk|9 years ago|reply
[+] [-] fivesigma|9 years ago|reply
[+] [-] jamesgeck0|9 years ago|reply
[+] [-] hvidgaard|9 years ago|reply
[+] [-] oldrny|9 years ago|reply
[1]: https://en.wikipedia.org/wiki/Ipe_(software)
[2]: https://dreampuf.github.io/GraphvizOnline/
[+] [-] arjie|9 years ago|reply
[+] [-] Cogito|9 years ago|reply
One thing that seemed glossed over, so I'm not sure if it's obvious for their use case, is the trade-off between compress once, distribute many times.
When looking at how long it takes to compress vs transmit, the optimisation was done to make the sum of both as small as possible - minimise(time(compress) + time(transmit)).
Instead it seems like you want to do is - minimise(time(compress) + expected_transmissions * time(transmit))
For any reasonable number of distributed copies of a game, it seems like this time to transmit will quickly come to dominate the total time involved.
I suspect, however, that the time to compress grows extremely quickly, for not much gain in compression, so the potential improvement is probably tiny even if you expect to be transmitting to millions of clients.
[+] [-] arjie|9 years ago|reply
The problem also applies to the binary delta. Adding a prefix will shift everything forward causing a diff in everything.
Bsdiff solves this with the suffix sorting. But what does rsync do? Or am I just wrong that rsync still works well? In either case, I think the offset problem makes for a more interesting motivating example for bsdiff.
[+] [-] rbehrends|9 years ago|reply
Thus, in your example, the first (and possibly the last) block won't be found, but all other blocks will be found, shifted by an offset of 1.
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] kuon|9 years ago|reply
The butler client is an incredible way of sending games to itch.io and I sincerely hope that Apple or Google had a good command line client as well.