KySync is an efficient way to distribute new data which makes use of older (but similar) data that you may already have present locally. KySync supports HTTP v1.1, but can easily be extended to support any server protocol which supports range queries.
The value proposition of KySync over Zsync is that it takes advantage of modern architecture features (multi-core multi-CPU systems as well as exceedingly fast IO subsystems, e.g. NVMe SSDs). KySync is 3x-10x (or more) faster than Zsync on such commonly available modern hardware. We have not spent much time optimizing KySync single-thread performance, so there are cases where with sufficiently high similarity, Zsync is faster when less then 4 threads are used in KySync.
Having been around the block optimizing things for throughput - the fact that a multi-threaded "A" is slower than a single-threaded "B" when A's thread count is low is a red flag.
I skimmed through kysync docs and I don't see any meaningful discussion of this aspect.
If you are to start with B and then parallelize its bottlenecks (where possible), you will normally see performance gradually increasing with the thread count, then plateauing and only then, possibly, starting to drop.
But if A with a single thread is outright worse than B, it means that A's per-thread overhead is higher. In case of kysync it appears to be 3-5 times as high as zsync's. That's immense!
Similarly, if A with a low thread count is worse than B, then it may be an issue with A's sync/threading model (prolonged contended waits, etc.).
In either case, it's a sign that there's something inherently inefficient with A's implementation... even if spawning 16 threads helps offsetting it and beating B's single thread.
I should have made my intentions clearer from the start :)
KySync started as a pet project to brush up on my C++ skills which I had not used for 20 years and was going to need for a job that I started last week! I had spent my fair share of time optimizing single threaded performance back in grad school, and was looking for different experiences.
That said, and as we mentioned in the write up, we had some improvements cooking that we did not put in v1.0. I spent some time tonight to merge them in and run some performance experiments.
Single threaded performance is now much improved and practically on par with Zsync. I released it as v1.1.
Within a single physical system ("multi-core multi-CPU" or with NVMe) - you rarely use something like rsync, zysnc or keysync: Your files are already local, on your system. If you want a second copy of a file on the same system, you would symlink or hard-link, or use other filesystem mechanisms. It's not even clear a clean copy would be slower than doing a bunch of comparisons.
On the other hand, between remote systems, the "modern day architecture features" mostly don't apply. I suppose a more clever use of modern kernels could help performance somewhat. Maybe.
We should make a clarification... The intent of KySync (as well as Zsync) is to use across systems, not on a single system. KySync supports HTTP (like Zsync) as well as HTTPS (which Zsync does not).
The primary reason to do the performance comparison on a single system, is so that the results are easy to replicate with as little setup as possible. Because we do this for both KySync and Zsync it is apples to apples.
HTTP bandwidth and storage cost money, and this is self funded project, so I can't afford to put up test files of data publicly visible to the world.
One thing we can look into is leverage AWS/S3 to upload some data and use it for a performance experiment, but that will need some logistics for the developer to set their AWS account properly. Will look into it.
Of course, the more similar the files are, the closer the remote results will be to this first set we published.
Could be useful in the backend infrastructure stuff if you have beefy machines processing lots of IO. There could be myriads other use cases. Hell, some people have 10gbs machines at home. Even modern CPUs, definitely mobile, can struggle to hit line rate in certain applications like large file transferring unless you’ve got a beefier machine.
> in this case it takes 4-8 threads or so for KySync to match Zsync's performance.
Assuming zsynch is single threaded, in an apples/apples comparison (1 thread) zsynch wins by over 10x. I can't help but wonder how much faster kysynch would be if they focused more on single threaded performance. The way things stand, kysynch's implementation sounds rather wasteful I'm terms of CPU resources, in spite of what this announcement claims. Still, kudos on this impressive project!
Given that it runs client side pairing with plain rsync servers, meant to efficiently support mass file distribution, using multiple cores but less efficiently to gain in overall performance seems a good tradeoff.
One thing most client/home systems have in abundance most the time is spare cores.
My reading is the server with zsync can be any http server, client is zsync. Rsync on the other hand needs to use rsync on the server, and it scales badly because it does most work on the server.
While I found the concept interesting, attempting to click through and scroll using spacebar/pgup/pgdn just doesn't work, and arrow keys do something very different from normal, both in Firefox and Chrome, which makes me think it's a deliberate choice.
Please consider not breaking how many people browse web sites.
Are you talking about the Notion site? I don’t think this is under his control. In case you’re not Tried it Notion is a note taking app (it’s pretty good).
Very cool, thanks for sharing. I did a deep dive in the past into various syncing/binary diff protocols and really liked zsync. It was probably my top choice for the application I was designing but I ended up not using it. The library I did use is called bita: https://github.com/oll3/bita. It is inspired by the same family of projects as zsync. The main advantage I found with bita is that the core logic is encapsulated in a library so that you don’t only have to use the binaries but can integrate it directly into an application. I’d be curious to know if that’s in the plans for KySync.
Any recommendations for best practices in "modern C++"? I have to resurrect an old codebase and would like to refactor it using more modern idioms, but its been a loooong time since I wrote any decent c++.
kyotov|4 years ago
KySync is an efficient way to distribute new data which makes use of older (but similar) data that you may already have present locally. KySync supports HTTP v1.1, but can easily be extended to support any server protocol which supports range queries.
KySync is [released](https://github.com/kyotov/kysync/releases) under the MIT Open Source License (see [COPYING](https://github.com/kyotov/ksync/blob/master/COPYING) in root of repository).
KySync is a full rewrite of [Zsync](http://zsync.moria.org.uk/) in modern C++. While no code was reused from Zsync, the awesome [Zsync technical paper](http://zsync.moria.org.uk/paper200503/) was the major resource used for the implementation of KySync.
The value proposition of KySync over Zsync is that it takes advantage of modern architecture features (multi-core multi-CPU systems as well as exceedingly fast IO subsystems, e.g. NVMe SSDs). KySync is 3x-10x (or more) faster than Zsync on such commonly available modern hardware. We have not spent much time optimizing KySync single-thread performance, so there are cases where with sufficiently high similarity, Zsync is faster when less then 4 threads are used in KySync.
killingtime74|4 years ago
huhtenberg|4 years ago
I skimmed through kysync docs and I don't see any meaningful discussion of this aspect.
If you are to start with B and then parallelize its bottlenecks (where possible), you will normally see performance gradually increasing with the thread count, then plateauing and only then, possibly, starting to drop.
But if A with a single thread is outright worse than B, it means that A's per-thread overhead is higher. In case of kysync it appears to be 3-5 times as high as zsync's. That's immense!
Similarly, if A with a low thread count is worse than B, then it may be an issue with A's sync/threading model (prolonged contended waits, etc.).
In either case, it's a sign that there's something inherently inefficient with A's implementation... even if spawning 16 threads helps offsetting it and beating B's single thread.
kyotov|4 years ago
KySync started as a pet project to brush up on my C++ skills which I had not used for 20 years and was going to need for a job that I started last week! I had spent my fair share of time optimizing single threaded performance back in grad school, and was looking for different experiences.
That said, and as we mentioned in the write up, we had some improvements cooking that we did not put in v1.0. I spent some time tonight to merge them in and run some performance experiments.
Single threaded performance is now much improved and practically on par with Zsync. I released it as v1.1.
https://kyall.notion.site/KySync-v1-1-dd9931f330f241469d3e60...
We will of course keep looking for more performance, but if you have any suggestions for further improvements, please share -- or join us!
Best, Kamen
P.S. Both of these are contributed by Chaim Mintz, whom I worked with in a previous life.
einpoklum|4 years ago
Within a single physical system ("multi-core multi-CPU" or with NVMe) - you rarely use something like rsync, zysnc or keysync: Your files are already local, on your system. If you want a second copy of a file on the same system, you would symlink or hard-link, or use other filesystem mechanisms. It's not even clear a clean copy would be slower than doing a bunch of comparisons.
On the other hand, between remote systems, the "modern day architecture features" mostly don't apply. I suppose a more clever use of modern kernels could help performance somewhat. Maybe.
kyotov|4 years ago
The primary reason to do the performance comparison on a single system, is so that the results are easy to replicate with as little setup as possible. Because we do this for both KySync and Zsync it is apples to apples.
HTTP bandwidth and storage cost money, and this is self funded project, so I can't afford to put up test files of data publicly visible to the world.
One thing we can look into is leverage AWS/S3 to upload some data and use it for a performance experiment, but that will need some logistics for the developer to set their AWS account properly. Will look into it.
Of course, the more similar the files are, the closer the remote results will be to this first set we published.
vlovich123|4 years ago
kettleballroll|4 years ago
Assuming zsynch is single threaded, in an apples/apples comparison (1 thread) zsynch wins by over 10x. I can't help but wonder how much faster kysynch would be if they focused more on single threaded performance. The way things stand, kysynch's implementation sounds rather wasteful I'm terms of CPU resources, in spite of what this announcement claims. Still, kudos on this impressive project!
kbenson|4 years ago
One thing most client/home systems have in abundance most the time is spare cores.
stingraycharles|4 years ago
zaphirplane|4 years ago
Zsync improves on rsync by
supports handling of some compressed formats. target can be http. offload more work to the client
ntoshev|4 years ago
rincebrain|4 years ago
Please consider not breaking how many people browse web sites.
killingtime74|4 years ago
lazypenguin|4 years ago
kyotov|4 years ago
kyotov|4 years ago
In summary, we see:
- 10%-100%+ performance improvement with v1.1 on 2 GiB data and 16 threads;
- A significantly improved single thread performance, now on par with Zsync for 2GiB data.
Checkout the whole write-up at https://kyall.notion.site/KySync-v1-1-dd9931f330f241469d3e60...
czep|4 years ago
kyotov|4 years ago
ComodoHacker|4 years ago
kyotov|4 years ago
Zsync has a comparison with rsync in isolated cases where it makes sense and it is very close in performance!
http://zsync.moria.org.uk/paper/ch02s08.html
tda|4 years ago
kyotov|4 years ago