I cant verify 40Gbps because I have never had access to a pipe that fast, but when I ran this tool on an AWS instance with a 20Gbps connection, it saturated that easily and maintained that speed for the duration of the transfer.
I just spawned a r6a.16xlarge with a 25gbps NIC, created a 10GB file which I uploaded to an S3 bucket in the same region, through a local S3 VPC endpoint.
Downloading that 10GB file to /dev/shm with s5cmd took 24s, all while spawning 20 or so threads which were all idling for IO.
Cranking up the worker count of the latter library until there is no more speedup, I can reach 6s with 80 workers. That is, 10/6 = 1.6GB/s, which seems to confirm my previous comment.
Okay I found the trick, buried in the benchmark setup of s5cmd.
The claimed numbers are _not_ reached with S3, but rather from a custom server emulating the S3 API, hosted on the client machine.
I think this is very misleading, since these benchmark numbers are not reachable in any real life scenario. It also shows that there is very little point in using s5cmd compared to other tools, since beyond 1.6GB/s the throttling will be from S3, not from the client, so any tool able to saturate 1.6GB/s will be enough.
Galanwe|8 months ago
Downloading that 10GB file to /dev/shm with s5cmd took 24s, all while spawning 20 or so threads which were all idling for IO.
The same test using a Python tool (http://github.com/NewbiZ/s3pd) with the same amount of workers took 10s.
Cranking up the worker count of the latter library until there is no more speedup, I can reach 6s with 80 workers. That is, 10/6 = 1.6GB/s, which seems to confirm my previous comment.
What am I doing wrong ?
Galanwe|8 months ago
The claimed numbers are _not_ reached with S3, but rather from a custom server emulating the S3 API, hosted on the client machine.
I think this is very misleading, since these benchmark numbers are not reachable in any real life scenario. It also shows that there is very little point in using s5cmd compared to other tools, since beyond 1.6GB/s the throttling will be from S3, not from the client, so any tool able to saturate 1.6GB/s will be enough.