top | item 39444564

(no title)

sprachspiel | 2 years ago

Just tested a i4i.32xlarge:

  $ lsblk
  NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  loop0          7:0    0  24.9M  1 loop /snap/amazon-ssm-agent/7628
  loop1          7:1    0  55.7M  1 loop /snap/core18/2812
  loop2          7:2    0  63.5M  1 loop /snap/core20/2015
  loop3          7:3    0 111.9M  1 loop /snap/lxd/24322
  loop4          7:4    0  40.9M  1 loop /snap/snapd/20290
  nvme0n1      259:0    0     8G  0 disk 
  ├─nvme0n1p1  259:1    0   7.9G  0 part /
  ├─nvme0n1p14 259:2    0     4M  0 part 
  └─nvme0n1p15 259:3    0   106M  0 part /boot/efi
  nvme2n1      259:4    0   3.4T  0 disk 
  nvme4n1      259:5    0   3.4T  0 disk 
  nvme1n1      259:6    0   3.4T  0 disk 
  nvme5n1      259:7    0   3.4T  0 disk 
  nvme7n1      259:8    0   3.4T  0 disk 
  nvme6n1      259:9    0   3.4T  0 disk 
  nvme3n1      259:10   0   3.4T  0 disk 
  nvme8n1      259:11   0   3.4T  0 disk
Since nvme0n1 is the EBS boot volume, we have 8 SSDs. And here's the read bandwidth for one of them:

  $ sudo fio --name=bla --filename=/dev/nvme2n1 --rw=read --iodepth=128 --ioengine=libaio --direct=1 --blocksize=16m
  bla: (g=0): rw=read, bs=(R) 16.0MiB-16.0MiB, (W) 16.0MiB-16.0MiB, (T) 16.0MiB-16.0MiB, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  ^Cbs: 1 (f=1): [R(1)][0.5%][r=2704MiB/s][r=169 IOPS][eta 20m:17s]
So we should have a total bandwidth of 2.7*8=21 GB/s. Not that great for 2024.

discuss

order

Aachen|2 years ago

So if I'm reading it right, the quote from the original article that started this thread was ballpark correct?

> we are still stuck with 2 GB/s per SSD

Versus the ~2.7 GiB/s your benchmark shows (bit hard to know where to look on mobile with all that line-wrapped output, and when not familiar with the fio tool; not your fault but that's why I'm double checking my conclusion)

Nextgrid|2 years ago

If you still have this machine, I wonder if you can get this bandwidth in parallel across all SSDs? There could be some hypervisor-level or host-level bottleneck that means while any SSD in isolation will give you the observed bandwidth, you can't actually reach that if you try to access them all in parallel?

dekhn|2 years ago

Can you addjust --blocksize to correspond to the block size on the device? And with/without --direct=1

zokier|2 years ago

I wonder if there is some tuning that needs to be done here, it seems suprising that the advertised rate would be this much off otherwise.

jeffbee|2 years ago

I would start with the LBA format, which is likely to be suboptimal for compatibility.

dangoodmanUT|2 years ago

that's 16m blocks, not 4k

wtallis|2 years ago

Last I checked, Linux splits up massive IO requests like that before sending them to the disk. But there's no benefit to splitting a sequential IO request all the way down to 4kB.