My 71 TiB ZFS NAS After 10 Years and Zero Drive Failures

[+] orbital-decay|1 year ago|reply

Do you have a drive rotation schedule?

24 drives. Same model. Likely the same batch. Similar wear. Imagine most of them failing at the same time, and the rest failing as you're rebuilding it due to the increased load, because they're already almost at the same point.

Reliable storage is tricky.

[+] throw0101c|1 year ago|reply

> This NAS is very quiet for a NAS (video with audio).

Big (large radius) fans can move a lot of air even at low RPM. And be much more energy efficient.

Oxide Computer, in one of their presentations, talks about using 80mm fans, as they are quiet and (more importantly) don't use much power. They observed, in other servers, as much as 25% of the power went just to powering the fans, versus the ~1% of theirs:

* https://www.youtube.com/shorts/hTJYY_Y1H9Q

* https://www.youtube.com/watch?v=4vVXClXVuzE

[+] daemonologist|1 year ago|reply

Interesting - I'm used to desktop/workstation hardware where 80mm is the smallest standard fan (aside from 40mm's in the near-extinct Flex ATX PSU), and even that is kind of rare. Mostly you see 120mm or 140mm.

[+] louwrentius|1 year ago|reply

+1 for mentioning 0xide. I love that they went this route and that stat is interesting. I hate the typical DC high RPM small fan whine.

I also hope that they do something 'smart' when they control the fan speed ;-)

[+] chiph|1 year ago|reply

My Synology uses two 120mm fans and you can barely hear them (it's on the desk next to me). I'm sold on the idea of moving more volume at less speed.

(which I understand can't happen in a 1U or 2U chassis)

[+] sss111|1 year ago|reply

just curious, are you associated with them, as these are very obscure youtube videos :D

Love it though, even the reduction in fan noise is amazing. I wonder why nobody had thought of it before, it seems so simple.

[+] turnsout|1 year ago|reply

I’ve heard the exact opposite advice (keep the drives running to reduce wear from power cycling).

Not sure what to believe, but I like having my ZFS NAS running so it can regularly run scrubs and check the data. FWIW, I’ve run my 4 drive system for 10 years with 2 drive failures in that time, but they were not enterprise grade drives (WD Green).

[+] CTDOCodebases|1 year ago|reply

I think a lot of the advice around keeping the drives running is about avoiding wear caused by spin downs and startups i.e. keeping the "Start Stop Cycles" low.

Theres a difference between spinning a drive up/down once or twice a day and spinning it down every 15 minutes or less.

Also WD Green drives are not recommended for NAS usage. I know in the past they used to park the read/write head every few seconds or so which is fine if data is being accessed infrequently but continuously however a server this can result in continuous wear which leads to premature failure.

[+] Dalewyn|1 year ago|reply

>Not sure what to believe

Keep them running.

Why?:

* The read/write heads experience literally next to no wear while they are floating above the platters. They physically land onto shelves or onto landing zones on the platters themselves when turned off; landing and takeoff are by far the most wear the heads will suffer.

* Following on the above, in the worst case the read/write heads might be torn off during takeoff due to stiction.

* Bearings will last longer; they might also seize up if left stationary for too long. Likewise the drive motor.

* The rush of current when turning on is an electrical stressor, no matter how minimal.

The only reasons to turn your hard drives off are to save power, reduce noise, or transport them.

[+] louwrentius|1 year ago|reply

Hard drives are often configured to spin down when idle for a certain time. This can cause many spinups and spindowns per day. So I don't buy this at all. But I don't have supporting evidence that back up this notion.

[+] bongodongobob|1 year ago|reply

This is completely dependant on access frequency. Do you have a bunch of different people accessing many files frequently? Are you doing frequent backups?

If so then yes, keeping them spinning may help improve lifespan by reducing frequent disk jerk. This is really only applicable when you're at a pretty consistent high load and you're trying to prevent your disks from spinning up and down every few minutes or something.

For a homelab, you're probably wasting way more money in electricity than you are saving in disk maintenance by leaving your disks spin.

[+] foobarian|1 year ago|reply

> regularly run scrubs and check the data

Does this include some kind of built-in hash/checksum system to record e.g. md5 sums of each file and periodically test them? I have a couple of big drives for family media I'd love to protect with a bit more assurance than "the drive did not fail".

[+] manuel_w|1 year ago|reply

Discussions on checksumming filesystems usually revolve around ZFS and BTRFS, but has someone any experience with bcachefs? It's upstreamed in the linux kernel, I learned, and is supposed to have full checksumming. The author also seems to take filesystem responsibility seriously.

Is anyone using it around here?

https://bcachefs.org/

[+] ffsm8|1 year ago|reply

I tried it out on my homelab server right after the merge into the Linux kernel.

Took roughly one week for the whole raid to stop mounting because of the journal (8hdd, 2 ssd write cache, 2 nvme read cache).

The author responded on Reddit within a day, I tried his fix, (which meant compiling the Linux kernel and booting from that), but his fix didn't resolve the issue. He sadly didn't respond after that, so I wiped and switched back to a plain mdadmin raid after a few days of waiting.

I had everything important backed up, obviously (though I did lose some unimportant data), but it did remind me that bleeding edge is indeed ... Unstable

The setup process and features are fantastic however, simply being able to add a disk and flag it as read/write cache feels great. I'm certain I'll give it another try in a few years, after it had some time in the oven.

[+] clan|1 year ago|reply

That was a decision Linus regretted[1]. There has been some recent discussion about this here on Hacker News[2].

[1] https://linuxiac.com/torvalds-expresses-regret-over-merging-...

[2] https://news.ycombinator.com/item?id=41407768

[+] olavgg|1 year ago|reply

It is marked experimental, and since it was merged into the kernel there have been a few major issues that has been resolved. I wouldn't risk production data on it, but for a home lab it could be fine. But you need to ask yourself, how much time are you willing to spend if something should go wrong? I have also been running ZFS for 15+ years, and I've seen a lot of crap because of bad hardware. But with good enterprise hardware it has been working flawless.

[+] eru|1 year ago|reply

I'm using it. It's been ok so far, but you should have all your data backed up anyway, just in case.

I'm trying a combination where I have an SSD (of about 2TiB) in front of a big hard drive (about 8 TiB) and using the SSD as a cache.

[+] DistractionRect|1 year ago|reply

I'm optimistic about it, but probably won't switch over my home lab for a while. I've had quirks with my (now legacy) zsys + zfs on root for Ubuntu, but since it's a common config//widely used for years it's pretty easy to find support.

I probably won't use bcachefs until a similar level of adoption/community support exists.

[+] rollcat|1 year ago|reply

Can't comment on bcachefs (I think it's still early), but I've been running with bcache in production on one "canary" machine for years, and it's been rock-solid.

[+] rnxrx|1 year ago|reply

In my experience the environment where the drives are running makes a huge difference in longevity. There's a ton more variability in residential contexts than in data center (or even office) space. Potential temperature and humidity variability is a notable challenge but what surprised me was the marked effect of even small amounts of dust.

Many years ago I was running an 8x500G array in an old Dell server in my basement. The drives were all factory-new Seagates - 7200RPM and may have been the "enterprise" versions (i.e. not cheap). Over 5 years I ended up averaging a drive failure every 6 months. I ran with 2 parity drives, kept spares around and RMA'd the drives as they broke.

I moved houses and ended up with a room dedicated to lab stuff. With the same setup I ended up going another 5 years without a single failure. It wasn't a surprise that the new environment was better, but it was surprising how much better a cleaner, more stable environment ended up being.

[+] kalleboo|1 year ago|reply

A drive failure every 6 months almost sounds more like dirty power than dust, I’ve always kept my NAS/file servers in dusty residential environments (I have a nice fuzzy gray Synology logo visible right now) and never seen anything like that

[+] ylee|1 year ago|reply

>Many years ago I was running an 8x500G array in an old Dell server in my basement. The drives were all factory-new Seagates - 7200RPM and may have been the "enterprise" versions (i.e. not cheap). Over 5 years I ended up averaging a drive failure every 6 months. I ran with 2 parity drives, kept spares around and RMA'd the drives as they broke.

Hah! I had a 16x500GB Seagate array and also averaged an RMA every six months. I think there was a firmware issue with that generation.

[+] sega_sai|1 year ago|reply

It is most likely the model's fault. I once had a machine with 36 Seagate ST3000DM001, they were failing almost once a month -- see the annual failure rate here https://www.backblaze.com/blog/best-hard-drive-q4-2014/

[+] stavros|1 year ago|reply

How does dust affect things? The drives are airtight.

[+] ffsm8|1 year ago|reply

> Losing the system due to power shenanigans is a risk I accept.

There is another (very rare) failure an ups protects against, and that's imbalance in the electricity.

You can get a spike (up or down, both can be destructive) if there is construction in your area and something happens with the electricity, or lightning hits a pylon close enough to your house.

First job I worked at had multiple servers die like that, roughly 10 yrs ago. it's the only time I've ever heard of such an issue however

To my understanding, an ups protects from such spikes as well, as it will die before letting your servers get damaged

[+] danw1979|1 year ago|reply

I’ve had firsthand experience of a lightning strike hitting some gear that I maintained…

My parent’s house got hit right on the TV antenna, which was connected via coax down the the booster/splitter unit in comms cupboard … then somehow it got onto the nearby network patch panel and fried every wired ethernet controller attached to the network, including those built into switch ports, APs, etc. In the network switch, the current destroyed the device’s power supply too, as it was trying to get to ground I guess.

Still a bit of a mystery how it got from the coax to the cat5. maybe a close parallel run the electricians put in somewhere ?

Total network refit required, but thankfully there were no wired computers on site… I can imagine storage devices wouldn’t have fared very well.

[+] deltarholamda|1 year ago|reply

This depends very much on the type of UPS. Big, high dollar UPSes will convert the AC to DC and back to AC, which gives amazing pure sine wave power.

The $99 850VA APC you get from Office Depot does not do this. It switches from AC to battery very quickly, but it doesn't really do power conditioning.

If you can afford the good ones, they genuinely improve reliability of your hardware over the long term. Clean power is great.

[+] JonChesterfield|1 year ago|reply

Lightning took out a modem and some nearby hardware here about a week ago. Residential. The distribution of dead vs damaged vs nominally unharmed hardware points very directly at the copper wire carrying vdsl. Modem was connected via ethernet to everything else.

I think the proper fix for that is probably to convert to optical, run along a fibre for a bit, then convert back. It seems likely that electricity will take a different route in preference to the glass. That turns out to be disproportionately annoying to spec (not a networking guy, gave up after an hour trying to distinguish products) so I've put a wifi bridge between the vdsl modem and everything else. Hopefully that's the failure mode contained for the next storm.

Mainly posting because I have a ZFS array that was wired to the same modem as everything else. It seems to have survived the experience but that seems like luck.

[+] manmal|1 year ago|reply

We’ve had such spikes in an old apartment we were living in. I had no servers back then, but LED lamps annoyingly failed every few weeks. It was an old building from the 60s and our own apartment had some iffy quick fixes in the installation.

[+] unknown|1 year ago|reply

[deleted]

[+] int0x29|1 year ago|reply

Isn't this what a surge protector is for?

[+] louwrentius|1 year ago|reply

True, this is also what I mean with power shenanigans.

My server is off most off the time, disconnected. But even if it wasn’t, I just accept the risk.

[+] Gud|1 year ago|reply

Electronics is absolutely sensitive to this.

Please use filters.

[+] mvanbaak|1 year ago|reply

the 'secret' is not that you turn them off. it's simply luck.

I have 4TB HGST drives running 24/7 for over a decade. ok, not 24 but 8, and also 0 failures. But I'm also lucky, like you. Some of the people I know have several RMAs with the same drives so there's that.

My main question is: What is it that takes 71TB but can be turned off most of the time? Is this the server you store backups?

[+] monocasa|1 year ago|reply

In fact, the conventional wisdom for a long time was to not turn them off if you want longevity. Bearings seize when cold for instance.

[+] louwrentius|1 year ago|reply

It can be luck, but with 24 drives, it feels very lucky. Somebody with proper statistics knowledge can probably calculate the risk with a guestimated 1% yearly failure rate how likely it would be to have all 24 drives remaining.

And remember, my previous NAS with 20 drives also didn't have any failures. So N=44, how lucky must I be?

It's for residential usage, and if I need some data, I often just copy it over 10Gbit to a system that uses much less power and this NAS is then turned off again.

[+] ryanjshaw|1 year ago|reply

> What is it that takes 71TB but can be turned off most of the time?

Still waiting for somebody to explain this to me as well.

[+] ggm|1 year ago|reply

There have been drives where power cycling was hazardous. So, whilst I agree to the model, it shouldn't be assumed this is always good, all the time, for all people. Some SSD need to be powered periodically. The duty cycle for a NAS probably meets that burden.

Probably good, definitely cheaper power costs. Those extra grease on the axle drives were a blip in time.

I wonder if backblaze do a drive on-off lifetime stats model? I think they are in the always on problem space.

[+] tie-in|1 year ago|reply

We've been using a multi-TB PostgreSQL database on ZFS for quite a few years in production and have encountered zero problems so far, including no bit flips. In case anyone is interested, our experience is documented here:

https://lackofimagination.org/2022/04/our-experience-with-po...

[+] lawaaiig|1 year ago|reply

Regarding the intermittent power cutoffs during boot it should be noted the drives pull power from the 5V rail on startup: comparable drives typically draw up to 1.2A. Combined with the maximum load of 25A on the 5V rail (Seasonic Platinum 860W), it's likely you'll experience power failures during boot if staggered spinup is not used.

[+] rkagerer|1 year ago|reply

I have a similar-sized array which I also only power on nightly to receive backups, or occasionally when I need access to it for a week or two at a time.

It's a whitebox RAID6 running NTFS (tried ReFS, didn't like it), and has been around for 12+ years, although I've upgraded the drives a couple times (2TB --> 4TB --> 16TB) - the older Areca RAID controllers make it super simple to do this. Tools like Hard Disk Sentinel are awesome as well, to help catch drives before they fail.

I have an additional, smaller array that runs 24x7, which has been through similar upgrade cycles, plus a handful of clients with whitebox storage arrays that have lasted over a decade. Usually the client ones are more abused (poor temperature control when they delay fixing their serveroom A/C for months but keep cramming in new heat-generating equipment, UPS batteries not replaced diligently after staff turnover, etc...).

Do I notice a difference in drive lifespan between the ones that are mostly-off vs. the ones that are always-on? Hard to say. It's too small a sample size and possibly too much variance in 'abuse' between them. But definitely seen a failure rate differential between the ones that have been maintained and kept cool, vs. allowed to get hotter than is healthy.

I can attest those 4TB HGST drives mentioned in the article were tanks. Anecdotally, they're the most reliable ones I've ever owned. And I have a more reasonable sample size there as I was buying dozens at a time for various clients back in the day.

[+] Tepix|1 year ago|reply

Having 24 drives probably offers some performance advantages, but if you don‘t require them, having a 6-bay NAS with 18TB disks instead woukld offer a ton of advantages in terms of power usage, noise, space required, cost and reliability.

[+] wazoox|1 year ago|reply

I currently support many NAS servers in the 50TB - 2PB range, many of them being 10, 12, and up to 15 years old for some of them. Most of them still run with their original power supplies, motherboard and most of their original (HGST -- now WD -- UlstraStar) drives, though of course a few drives have failed for some of them (but not all).

2, 4, 8TB HGST UltraStar disks are particularly reliable. All of my desktop PCs currently hosts mirrors of 2009 vintage, 2 TB drives that I got when they're put out of service. I have heaps of spare, good 2 TB drives (and a few hundreds still running in production after all these years).

For some reason 14TB drives seem to have a much higher failure rate than Helium drives of all sizes. On a fleet of only about 40 14 TB drives, I had more failures than on a fleet of over 1000 12 and 16 TB.

[+] hi_hi|1 year ago|reply

I've had the exact same NAS for over 15 years. It's had 5 hard drives replaced, 2 new enclosures and 1 new power supply, but it's still as good as new...

[+] Saris|1 year ago|reply

I'm curious what's your use case for 71TB of data where you can also shut it down most of the time?

My NAS is basically constantly in use, between video footage being dumped and then pulled for editing, uploading and editing photos, keeping my devices in sync, media streaming in the evening, and backups from my other devices at night..

[+] lostmsu|1 year ago|reply

I have a mini PC + 4x external HDDs (I always bought used) on Windows 10 with ReFS since probably 2016 (recently upgraded to Win 11), maybe earlier. I don't bother powering off.

The only time I had problems is when I tried to add a 5th disk using a USB hub, which caused drives attached to the hub get disconnected randomly under load. This actually happened with 3 different hubs, so I since stopped trying to expand that monstrosity and just replace drives with larger ones instead. Don't use hubs for storage, majority of them are shitty.

Currently ~64TiB (less with redundancy).

Same as OP. No data loss, no broken drives.

A couple of years ago I also added an off-site 46TiB system with similar software, but a regular ATX with 3 or 4 internal drives because the spiderweb of mini PC + dangling USBs + power supplies for HDDs is too annoying.

I do weekly scrubs.

Some notes: https://lostmsu.github.io/ReFS/

[+] anjel|1 year ago|reply

Ca't help but wonder how much electricity would have been consumed if you had left it on 24/7 for ten years...

[+] why_only_15|1 year ago|reply

I'm confused about optimizing 7 watts as important -- rough numbers, 7 watts is 61 kWh/y. If you assume US-average prices of $0.16/kWh that's about $10/year.

edit: looks like for the netherlands (where he lives) this is more significant -- $0.50/kWh is the average price, so ~$32/year

[+] tobiasbischoff|1 year ago|reply

Let me tell you powering these drives on and off is far more dangerous then just keeping them running. 10 years is well in the MTBF of these enterprise drives. (I worked for 10 years as enterprise storage technician, i saw a lot if sh*).

[+] lifeisstillgood|1 year ago|reply

My takeaway is that there is a difference between residential and industrial usage, just as there is a difference between residential car ownership and 24/7 taxi / industrial use

And that no matter how amazing the industrial revolution has been, we can build reliability at the residential level but not the industrial level.

And certainly at the price points.

The whole “At FAANG scale” is a misnomer - we aren’t supposed to use residential quality (possibly the only quality) at that scale - maybe we are supposed to park our cars in our garages and drive them on a Sunday

Maybe we should keep our servers at home, just like we keep our insurance documents and our notebooks

299 comments