(no title)
nickcw | 2 months ago
The main problem is that it packetizes the data and waits for responses, effectively re-implementing the TCP window inside a TCP stream. You can only have so many packets outstanding in the standard SFTP implementation (64 is the default) and the buffers are quite small (32k by default) which gives a total outstanding data of 2MB. The highest transfer rate you can make depends on the latency of the link. If you have 100 ms of latency then you can send at most 20 MB/s which is about 200 Mbit/s - nowhere near filling a fast wide pipe.
You can tweak the buffer size (up to 256k I think) and the number of outstanding requests, but you hit limits in the popular servers quite quickly.
To mitigate this rclone lets you do multipart concurrent uploads and downloads to sftp which means you can have multiple streams operating at 200 Mbit/s which helps.
The fastest protocols are the TLS/HTTP based ones which stream data. They open up the TCP window properly and the kernel and networking stack is well optimized for this use. Webdav is a good example.
adolph|2 months ago
I think maybe you are referring to QUIC [0]? It'd be interesting to see some userspace clients/servers for QUIC that compete with Aspera's FASP [1] and operate on a point to point basis like scp. Both use UDP to decrease the overhead of TCP.
0. https://en.wikipedia.org/wiki/QUIC
1. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol
rapier1|2 months ago
Veserv|2 months ago
To be fair, that is not really a problem of the protocol, just the implementations. You can comfortably drive 10x that bandwidth with a reasonable design.
[1] https://microsoft.github.io/msquic/
nickcw|2 months ago
I just think for streaming lots of data quickly HTTP/1.x plus TLS plus TCP has received many more engineering hours of optimization than any other combo.
riobard|2 months ago
why is it designed this way? what problems it's supposed to solve?
nickcw|2 months ago
SFTP was designed as a remote file system system access protocol rather than transfer a single file like scp.
I suspect that the root of the problem is that SFTP works over a single SSH channel. SSH connections can have multiple channels but usually the server binds a single channel to a single executable so it makes sense to use only a single channel.
Everything flows from that decision - packetisation becomes necessary otherwise you have to wait for all the files to transfer before you can do anything else (eg list a directory) and that is no good for your remote filesystem access.
Perhaps the packets could have been streamed but the way it works is more like an RPC protocol with requests and responses. Each request has a serial number which is copied to the response. This means the client can have many requests in-flight.
There was a proposal for rclone to use scp for the data connections. So we'd use sftp for the day to day file listings, creating directories etc, but do actual file transfers with scp. Scp uses one SSH channel per file so doesn't suffer from the same problems as sftp. I think we abandoned that idea though as many sftp servers aren't configured with scp as well. Also modern versions of OpenSSH (OpenSSH 9.0 released April 2022) use SFTP instead of scp anyway. This was done to fix various vulnerabilities in scp as I understand.
charonn0|2 months ago
Veserv|2 months ago
It just has a in-flight message/queue limit like basically every other communication protocol. You can only buffer so many messages and space for responses until you run out of space. The problem there is just that the default amount of buffering is very low and is not adaptive to the available space/bandwidth.
rapier1|2 months ago
https://gist.github.com/rapier1/325de17bbb85f1ce663ccb866ce2...
adrian_b|2 months ago
Besides being faster, with rsync and the right command options you can be certain that it makes exact file copies, together with any file metadata, even between different operating systems and file systems.
I have not checked if in recent years all the bugs of scp and sftp have been fixed, but some years ago there were cases when scp and sftp were losing silently, without warnings, some file metadata (e.g. high-precision timestamps, which were truncated, or extended file attributes).
I am using ssh every day, but there are decades since I have last used scp or sftp, with the exception of the cases when I have to connect to a server that I cannot control and where it happens that rsync is not installed. Even on such servers, if I may add an executable in my home directory, I first copy there an rsync with scp, then I do any other copies with that rsync.
imcritic|2 months ago
formerly_proven|2 months ago