top | item 46890621

(no title)

nh2 | 26 days ago

I was about to say:

The question was what exactly rsync pipelines, and whether it serialises its network sends. If true, that would be a plausible cause of parallelism speeding it up.

Serial local reads are not a plausible cause, because the autor describes working on NVMe SSDs which have so low latency that they cannot explain that reading 59 GB across 3000 files take 8 minutes.

However:

You might actually be half-right because in the main output shown in the blog post, the author is NOT using local SSDs. The invocation is `rsync ... /Volumes/mercury/* /Volumes/...` where `mercury` is a network share mount (and it is unspecified what kind of share that is). So in that case, every read that looks "local" to rsync is actually a network access. It is totally possible that rsync treats local reads as fast and thus they are not pipelined.

In fact, it is even highly likely that rsync will not / cannot pipeline reading files that appear local to it, because normal POSIX file IO does not really offer any ways to non-blocking read regular files, so the only way to do that is with threads, which rsync doesn't use.

(Extra evidence about rsync using normal blocking writes, and not supporting threads, beyond the fact that no threading code exists in rsync's repo: https://github.com/RsyncProject/rsync/blob/236417cf354220669...)

So while "the dead time isn't waiting for network trips between files" would be wrong -- it absolutely would wait for network trips between files -- your "filesystem access and general threading is the question" would be spot-on.

So in that case rclone is just faster because it reads from his network mount in parallel. This would also explain why he reports `tar` as not being faster, because that, too, reads files serially from the network mount. Supposedly this situation could be avoided by running rsync "normally" via SSH, so that file reads are actually fast on the remote side.

The situation is extra confused by the author writing below his run output:

    even experimenting with running the rsync daemon instead of SSH

when in fact the output above didn't rsync over SSH at all.

Another weird thing I spotted is that the rsync output shown in the post

    Unmatched data: 62947785101 B

seems impossible: The string "Unmatched data" doesn't seem to exist in the rsync source code, and hasn't since 1996. So it is unclear to me what version of rsync used.

I commented that on https://www.jeffgeerling.com/blog/2025/4x-faster-network-fil...

discuss

Dylan16807|26 days ago

> Serial local reads are not a plausible cause, because the autor describes working on NVMe SSDs which have so low latency that they cannot explain that reading 59 GB across 3000 files take 8 minutes.

But the people you responded to were talking about slowdowns that exist in general, not just ones that apply directly to the post.

For the post, my personal guess is that per-file overhead isn't a huge factor here, and it's mostly rsync having trouble doing >1Gbps over the network.

> In fact, it is even highly likely that rsync will not / cannot pipeline reading files that appear local to it, because normal POSIX file IO does not really offer any ways to non-blocking read regular files, so the only way to do that is with threads, which rsync doesn't use.

Makes sense.

> it absolutely would wait for network trips between files

I don't see why you're saying this. I expect it to serially read files and then put that data into a buffer that can have data from multiple files at the same time. In other words, pipelined networking. As long as the transfer queue doesn't bottom out it shouldn't have to wait for any network round trips. What leads you to think otherwise?

nh2|26 days ago

> But the people you responded to were talking about slowdowns that exist in general, not just ones that apply directly to the post.

I think that's incorrect though. These slowdowns do not exist in general (see my next reply where I run rsync and it immadiately maxes out my 10 Gbit/s).

I think original poster digiown is right with "Note there is no intrinsic reason running multiple streams should be faster than one [EDIT: 'at this scale']. It almost always indicates some bottleneck in the application". In this case it's the user running rsync as a serially-reading program reading from a network mount.

> rsync having trouble doing >1Gbps over the network

rsync copies at 10 Gbit/s without problem between my machines.

Though I have to give `-e 'ssh -c aes256-gcm@openssh.com'` or aes128-gcm, otherwise encryption bottlenecks at 5 Gbit/s with the default `chacha20-poly1305@openssh.com`.

> I don't see why you're saying this.

Because of the part you agreed making sense: It read each file with the sequence `open()/read()/.../read()/close()`, but those files are on the network mount ("/Volumes/mercury"), so each `read()` of size `#define IO_BUFFER_SIZE (32*1024)` is a network roundtrip.