The fact that sftp is not the fastest protocol is well known by rclone users.
The main problem is that it packetizes the data and waits for responses, effectively re-implementing the TCP window inside a TCP stream. You can only have so many packets outstanding in the standard SFTP implementation (64 is the default) and the buffers are quite small (32k by default) which gives a total outstanding data of 2MB. The highest transfer rate you can make depends on the latency of the link. If you have 100 ms of latency then you can send at most 20 MB/s which is about 200 Mbit/s - nowhere near filling a fast wide pipe.
You can tweak the buffer size (up to 256k I think) and the number of outstanding requests, but you hit limits in the popular servers quite quickly.
To mitigate this rclone lets you do multipart concurrent uploads and downloads to sftp which means you can have multiple streams operating at 200 Mbit/s which helps.
The fastest protocols are the TLS/HTTP based ones which stream data. They open up the TCP window properly and the kernel and networking stack is well optimized for this use. Webdav is a good example.
> The fastest protocols are the TLS/HTTP based ones which stream data.
I think maybe you are referring to QUIC [0]? It'd be interesting to see some userspace clients/servers for QUIC that compete with Aspera's FASP [1] and operate on a point to point basis like scp. Both use UDP to decrease the overhead of TCP.
If you want to see the impact that the flow control buffer size has on OpenSSH I put up a graph based on data collected last week. Basically, it has a huge impact on throughput.
When you are limited to use SSH as the transport, you can still do better than using scp or sftp by using rsync with --rsh="ssh ...".
Besides being faster, with rsync and the right command options you can be certain that it makes exact file copies, together with any file metadata, even between different operating systems and file systems.
I have not checked if in recent years all the bugs of scp and sftp have been fixed, but some years ago there were cases when scp and sftp were losing silently, without warnings, some file metadata (e.g. high-precision timestamps, which were truncated, or extended file attributes).
I am using ssh every day, but there are decades since I have last used scp or sftp, with the exception of the cases when I have to connect to a server that I cannot control and where it happens that rsync is not installed. Even on such servers, if I may add an executable in my home directory, I first copy there an rsync with scp, then I do any other copies with that rsync.
Any chance this work can be upstreamed into mainline SSH? I'd love to have better performance for SSH, but I'm probably not going to install and remember to use this just for the few times it would be relevant.
I doubt this would ever be accepted upstream. That said if one wants speed play around with lftp [1]. It has a mirror subsystem that can replicate much of rsync functionality in a chroot sftp-only destination and can use multiple TCP/SFTP streams in a batch upload and per-file meaning one can saturate just about any upstream. I have used this for transferring massive postgres backups and then because I am paranoid when using applications that automatically multipart transfer files I include a checksum file for the source and then verify the destination files.
The only downside I have found using lftp is that given there is no corresponding daemon for rsync on the destination then directory enumeration can be slow if there are a lot of nested sub-directories. Oh and the syntax is a little odd for me anyway. I always have to look at my existing scripts when setting up new automation.
Demo to play with, download only. Try different values. This will be faster on your servers, especially anything within the data-center.
ssh [email protected] # do this once to accept key as ssh-keyscan will choke on my big banner
mkdir -p /dev/shm/test && cd /dev/shm/test
lftp -u mirror, -e "mirror --parallel=4 --use-pget=8 --no-perms --verbose /pub/big_file_test/ /dev/shm/test;bye" sftp://mirror.newsdump.org
For automation add --loop to repeat job until nothing has changed.
OpenSSH is from the people at OpenBSD, which means performance improvements have to be carefully vetted against bugs, and, judging by the fact that they're still on fastfs and the lack of TRIM in 2025, that will not happen.
I admittedly don't really know how SSH is built but it looks to me like the patch that "makes" it HPN-SSH is already present upstream[1], it's just not applied by default?
Nixpkgs seems to allow you to build the pkg with the patch [2].
There’s a third party ZFS utility (zrepl, I think) that solves this in a nice way: ssh is used as a control channel to coordinate a new TLS connection over which the actual data is sent. It is considerably faster, apparently.
I don't think it comes as a surprise that you can improve performance by re-implementing ciphers, but what is the security trade-off? Many times, well audited implementations of ciphers are intentionally less performant in order to operate in constant time and avoid side channel attacks. Is it even possible to do constant time operations while being multithreaded?
The only change I see here that is probably harmless and a speed boost is using AES-NI for AES-CTR. This should probably be an upstream patch. The rest is more iffy.
The parallel ciphers are built using OpenSSL primitives. We aren't reimplementing the cipher itself in anyway. Since counter ciphers use an atomically increasing counter you can precompute the blocks in advance. Which is what we do - we have a cache of ketstream data that is precomputed and we pull the correct block off as needed - this gets around the need to have the application compute the blocks serially which can be a bottleneck at higher throughput rates.
The main performance improvement is from the buffer normalization. This can provide, on the right path, a 100x improvement in throughput performance without any compromise in security.
This has been around for years (like at least mid-2000’s). Gentoo used to have this patchset available as a USE flag on net-misc/openssh, but some time ago it was moved to net-misc/openssh-contrib (also configurable by useflag).
There are some minor usability bugs and I think both endpoints need to have it installed to take advantage. I remember asking ages ago why it wasn’t upstreamed, there were reasons…
to be honest, there was a period of time in about 2010 or 2012 where I simply wasn't maintaining it as well as I should have been. I wouldn't have upstreamed it then either. That's changed a lot since then.
As an aside - you only really need HPN-SSH on the receiving side of the bulk data to get the buffer normalization performance benefits. It turns out the bottleneck is almost entirely on the receiver and the client will send out data as quickly as you like. At least it was like that until OpenSSH 8.8. At that point changes were made where the client would crash if the send buffer exceeded 16MB. So we had to limit OpenSSH to HPN-SSH flows to a maximum of 16MB receive space. Which is annoying but that's still going to be a win for a lot of users.
This is cool very cool and I think I'll give it a try, though I'm wary about using a forked SSH so would love to see things land upstream.
I've been using mosh now for over a decade and it is amazing. Add on rsync for file transfers and I've felt pretty set. If you haven't checked out mosh, you should definitely do so!
The bottleneck in SSH is entirely on the receiving side. So as long at the receiver is using HPN-SSH you will see some performance improvements if the BDP of the path exceeds 2MB. Note: because of changes made to OpenSSH in 8.8 the maximum buffer with OpenSSH as the sender is 16MB. In an HPN to HPN connection that maximum receive buffer is 128MB.
The contracting activity in terms of rsync and async, where SFTP is secure tunneling, either with SSH or OpenSSH, which -p flag specifies as the port: 22, but /ssh/.configuring 10901 works for TCP.
[+] [-] nickcw|3 months ago|reply
The main problem is that it packetizes the data and waits for responses, effectively re-implementing the TCP window inside a TCP stream. You can only have so many packets outstanding in the standard SFTP implementation (64 is the default) and the buffers are quite small (32k by default) which gives a total outstanding data of 2MB. The highest transfer rate you can make depends on the latency of the link. If you have 100 ms of latency then you can send at most 20 MB/s which is about 200 Mbit/s - nowhere near filling a fast wide pipe.
You can tweak the buffer size (up to 256k I think) and the number of outstanding requests, but you hit limits in the popular servers quite quickly.
To mitigate this rclone lets you do multipart concurrent uploads and downloads to sftp which means you can have multiple streams operating at 200 Mbit/s which helps.
The fastest protocols are the TLS/HTTP based ones which stream data. They open up the TCP window properly and the kernel and networking stack is well optimized for this use. Webdav is a good example.
[+] [-] adolph|3 months ago|reply
I think maybe you are referring to QUIC [0]? It'd be interesting to see some userspace clients/servers for QUIC that compete with Aspera's FASP [1] and operate on a point to point basis like scp. Both use UDP to decrease the overhead of TCP.
0. https://en.wikipedia.org/wiki/QUIC
1. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol
[+] [-] riobard|3 months ago|reply
why is it designed this way? what problems it's supposed to solve?
[+] [-] rapier1|3 months ago|reply
https://gist.github.com/rapier1/325de17bbb85f1ce663ccb866ce2...
[+] [-] adrian_b|3 months ago|reply
Besides being faster, with rsync and the right command options you can be certain that it makes exact file copies, together with any file metadata, even between different operating systems and file systems.
I have not checked if in recent years all the bugs of scp and sftp have been fixed, but some years ago there were cases when scp and sftp were losing silently, without warnings, some file metadata (e.g. high-precision timestamps, which were truncated, or extended file attributes).
I am using ssh every day, but there are decades since I have last used scp or sftp, with the exception of the cases when I have to connect to a server that I cannot control and where it happens that rsync is not installed. Even on such servers, if I may add an executable in my home directory, I first copy there an rsync with scp, then I do any other copies with that rsync.
[+] [-] formerly_proven|3 months ago|reply
[+] [-] josephg|3 months ago|reply
[+] [-] Bender|3 months ago|reply
The only downside I have found using lftp is that given there is no corresponding daemon for rsync on the destination then directory enumeration can be slow if there are a lot of nested sub-directories. Oh and the syntax is a little odd for me anyway. I always have to look at my existing scripts when setting up new automation.
Demo to play with, download only. Try different values. This will be faster on your servers, especially anything within the data-center.
For automation add --loop to repeat job until nothing has changed.[1] - https://linux.die.net/man/1/lftp
[+] [-] harvie|3 months ago|reply
[+] [-] Almondsetat|3 months ago|reply
[+] [-] rapier1|3 months ago|reply
[+] [-] frantathefranta|3 months ago|reply
[1] https://github.com/freebsd/freebsd-ports/blob/main/security/...
[2] https://github.com/NixOS/nixpkgs/blob/d85ef06512a3afbd6f9082...
[+] [-] gorgoiler|3 months ago|reply
[+] [-] suprjami|3 months ago|reply
[+] [-] hsbauauvhabzb|3 months ago|reply
[+] [-] tristor|3 months ago|reply
The only change I see here that is probably harmless and a speed boost is using AES-NI for AES-CTR. This should probably be an upstream patch. The rest is more iffy.
[+] [-] rapier1|3 months ago|reply
The main performance improvement is from the buffer normalization. This can provide, on the right path, a 100x improvement in throughput performance without any compromise in security.
[+] [-] joecool1029|3 months ago|reply
There are some minor usability bugs and I think both endpoints need to have it installed to take advantage. I remember asking ages ago why it wasn’t upstreamed, there were reasons…
[+] [-] rapier1|3 months ago|reply
As an aside - you only really need HPN-SSH on the receiving side of the bulk data to get the buffer normalization performance benefits. It turns out the bottleneck is almost entirely on the receiver and the client will send out data as quickly as you like. At least it was like that until OpenSSH 8.8. At that point changes were made where the client would crash if the send buffer exceeded 16MB. So we had to limit OpenSSH to HPN-SSH flows to a maximum of 16MB receive space. Which is annoying but that's still going to be a win for a lot of users.
[+] [-] nhatcher|3 months ago|reply
[1]: https://mosh.org/
[+] [-] freedomben|3 months ago|reply
I've been using mosh now for over a decade and it is amazing. Add on rsync for file transfers and I've felt pretty set. If you haven't checked out mosh, you should definitely do so!
[+] [-] chrisweekly|3 months ago|reply
[+] [-] ollybee|3 months ago|reply
[+] [-] rapier1|3 months ago|reply
[+] [-] baden1927|3 months ago|reply