top | item 45439617

(no title)

ttfvjktesd | 5 months ago

You are under the assumption that only Ceph (and similar complex software) requires staff, whereas plain 30 PB can be operated basically just by rebooting from time to time.

I think that anyone with actual experience of operating thousands of physical disks in datacenters would challenge this assumption.

discuss

order

devanshp|5 months ago

we have 6 months of experience operating thousands of physical disks in datacenters now! it's about a couple hours a month of employee time in steady-state.

ttfvjktesd|5 months ago

How about all the other infrastructure. Since you are obviously not using the cloud, you must have massive amounts of GPUs and operating systems. All of that has been working together, it's not just keep watching for the physical disks and all is set.

Don't get me wrong, I buy the actual numbers regarding hardware costs, but in addition to that presenting the rest as basically a one man show in terms of maintenance hours is the point where I'm very sceptical.

rtp4me|5 months ago

Not really. Have spare drives on the shelf and use the "remote-hands" feature from the CoLo provider. Just open a ticket to have the drive swapped. Pretty easy. For remote server connections just use IPMI/iKVM and iPXE. Again, not too difficult.

The biggest hurdle is getting a mgmt system in place to alert you when something goes wrong - especially at this size. Grafana, Loki, monit, etc are all good tools to leverage that provide quick fault identification.