top | item 11389902

(no title)

ople | 10 years ago

Author here: We considered that but as the access pattern was likely pretty much random, the performance would have been terrible. Due to the break we had nearly a 1000 clustered servers sitting idle so it was reasonably quick to do the ramdisk trick.

discuss

icefo|10 years ago

I'm sorry but I don't understand something. What did you put on that big ramdisk ? The metadata ?

ople|10 years ago

We copied the raw image file of the corrupted metadata filesystem (MDT in Lustre lingo) to the ramdisk.

Then we mounted it via loopback and copied the files to tarballs. The bit that was really slow on the spinning disk was reading the millions of files from the metadata FS.

The basic process of the file-level backup is documented here: https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfu...