top | item 16797644

ZFS on Linux: Unlistable and disappearing files

380 points| heinrichhartman | 8 years ago |github.com | reply

161 comments

order
[+] ryao|8 years ago|reply
We are working on it. We know what patch introduced the regression and 0.7.8 is going out soon to revert it. Until then, users should downgrade to 0.7.6 if they have not already. The Gentoo and EPEL maintainers have pulled the affected releases from the repositories (technically masked on Gentoo). Ubuntu was never affected.

The regression makes it so that creating a new file could fail with ENOSPC after which files created in that directory could become orphaned. Existing files seem okay, but I have yet to confirm that myself and I cannot speak for what others know. It is incredibly difficult to reproduce on systems running coreutils 8.23 or later. So far, reports have only come from people using coreutils 8.22 or older. The directory size actually gets incremented for each orphaned file, which makes it wrong after orphan files happen.

We will likely have some way to recover the orphaned files (like ext4’s lost+found) and fix the directory sizes in the very near future. Snapshots of the damaged datasets are problematic though. Until we have a subcommand to fix it (not including the snapshots, which we would have to list), the damage can be removed from a system that has it either by rolling back to a snapshot before it happened or creating a new dataset with 0.7.6 (or another release other than 0.7.7), moving everything to the new dataset and destroying the old. That will restore things to pristine condition.

It should also be possible to check for pools that are affected, but I have yet to finish my analysis to be certain that no false negatives occur when checking, so I will avoid saying how for now.

[+] exikyut|8 years ago|reply
> We will likely have some way to recover the orphaned files (like ext4’s lost+found) and fix the directory sizes in the very near future.

How should people behave right now?

Will normal usage of production filesystems erase data, or will read/write activity leave the potentially-orphaned files in place?

You've also mentioned snapshots being tricky in the thread. Should people stop creating snapshots in case orphaned files are not included in the snapshots?

--

> It is incredibly difficult to reproduce on systems running coreutils 8.23 or later.

IIUC:

- This is specifically due to the fact that `cp` in 8.23 is optimized (8.22 created files in {0..2000} order, 8.23+ randomized the order (I don't quite understand why))

- The script in https://gist.github.com/trisk/9966159914d9d5cd5772e44885112d... uses `touch` to create files in random order and some people reported this triggered the bug

[+] antongribok|8 years ago|reply
This is a good reminder for everyone that snapshots are not backups.

Also, thank you for all of the hard work on ZoL!

[+] BrandonH45|8 years ago|reply
Using ZoL extensively in commercial environment - just upgraded to RHEL 7.5 with ZFS 0.7.8.1 - lost a huge 32TB pool with exact same issue as in 0.7.7 - Not an expert but we think that 0.7.8.1 didn't regress the issue properly. Issue still exists.

Upgraded from 0.7.3

Love the product and your work.

[+] herogreen|8 years ago|reply
[+] wila|8 years ago|reply
While the 2.6.32-696.23.1.el6.x86_64 is an old kernel, it is actually the current kernel for CentOS/RHEL 6. Red Hat has backported a lot of fixes to it and CentOS/RHEL 6 is a distribution that is still under maintenance.

It is still out there in a lot of places and has been in production for a while. That it is old, is a feature, it also means that it is incredibly stable.

Also note that the problem is also reported on a CentOS 7.4 kernel a few posts down.

[+] ryao|8 years ago|reply
We used to support Linux 2.6.16, but support for that was dropped a long time ago. Then we had support for 2.6.26, but support was dropped for that too. 2.6.32 support will be maintained until at least 2020 if I recall correctly. I could see us maintaining it well past that too.
[+] BrandonH45|8 years ago|reply
Indeed, but we are just end consumers of RHEL's release cycles. In large commercial environment (banking in this case) we are bound to use the supported versions from commercial vendors by law.

My scenario is simple - lots of very stable zfs 0.7.7 running on RHEL 7.4. We saw the 0.7.7 media reports of the bug and upgraded to 0.7.8 in fear of a catastrophe - but all hell broke loose after upgrading. Downgraded to 0.7.7 but got the new 7.5 RHEL kernel and still everything is a mess. Rebuilt some test systems with RHEL release 7.5 and 0.7.7 and still cannot even list a mount point of a brand new zpool without even creating any files on it.

Now we are seriously worried. Can't go back to 0.7.6 easily and any way zfs 0.7.6 modules don't load into RHEL 7.5 release - not compatible. We may have to go back to RHEL 7.4 and 0.7.7 which is still stable on many systems.

You guys are all heroes for your phenomenal work. Please keep to going...

[+] mrmondo|8 years ago|reply
My thoughts exactly, while the RHEL kernel has lots of backported patches - it's by no means complete or near current.
[+] spindle|8 years ago|reply
Probably obvious, but hooray for open source software! What a fantastic response to the bug.
[+] tetha|8 years ago|reply
Yep, this is one of the big reasons why I love the open source linux world. Something breaks -- and in this case, the initial situation looks quite terrifying. And then a whole bunch of people pile in with information, reproduction, systems they can use for testing, ways to handle this in production, ...
[+] tinus_hn|8 years ago|reply
Imagine if this had happened to Apple, everyone and his dog would be rolling over each other villifying them.
[+] Annatar|8 years ago|reply
Since when is introducing a major regression into something as critical as a filing system a hooray for open source? Only on GNU/Linux... and nobody thinks twice about it or bats an eyelash; instead, the scrambling to contain the regression is greeted with a hooray. It’s mentality like this which makes me want to never touch a computer again.
[+] drewg123|8 years ago|reply
Is this ZOL tip or Linux specific? FWIW, the bug does not seem to reproduce on FreeBSD-current (r332158).

However, it also does not reproduce on Ubuntu 4.4.0-116-generic running the ZFS stuff from Ubuntu.

[+] aidenn0|8 years ago|reply
See the test-case furthest down in the bug report. They were able to get it to reproduce on current linux distributions. The differences between older and newer distributions had to do with coreutils changes to the "cp" implementation between 8.22 and 8.23

Reverting this commit causes the bug to stop reproducing:

https://github.com/zfsonlinux/zfs/commit/cc63068e95ee725cce0...

[edit]

An inspection of the following makes me think this is ZoL specific.

https://github.com/freebsd/freebsd/blob/58941b0245fd8d3d5861...

[+] ryao|8 years ago|reply
Ubuntu was never affected. The regression started in 0.7.7 and Ubuntu is on 0.7.5. HEAD was affected until earlier today when the patch was reverted.

I am not sure if the bad patch was ported to the other OpenZFS platforms.

[+] hippich|8 years ago|reply
From title first thought was - "Wow, this is awesome security/privacy feature of the file system!" and then from reading comments I realise it is a bug
[+] mafro|8 years ago|reply
Great to hear the ZoL guys are right on this. Bravo.

It also reminds me why my NAS runs Debian..

[+] sureaboutthis|8 years ago|reply
It also reminds me why my NAS runs FreeBSD.
[+] rhabarba|8 years ago|reply
So this is not a "ZFS bug", but a "ZFS on Linux" bug. The actual ZFS on systems which have had ZFS for decades is not affected at all.
[+] ryao|8 years ago|reply
It is a recent regression from 2 months ago that originated in the Linux port. I do not know the status of the other platforms. I and the others involved with the Linux port are still busy handling the issue. I suggest asking the developers of the other platforms whether they had adopted the bad patch or not.
[+] Fnoord|8 years ago|reply
> The actual ZFS on systems which have had ZFS for decades is not affected at all.

ZFS hasn't existed "for decades".

[+] beedogs|8 years ago|reply
At least it's no btrfs. What a disaster that filesystem's been.
[+] Arbalest|8 years ago|reply
A disaster for enterprise perhaps. I use only a subset of features (such as subvolumes and snapshots and a little RAID 1) and have never had problems. With the way some people talk about it, it sure sounds like it never worked at all.
[+] isnull|8 years ago|reply
i have been using btrfs in production for years now and it has never failed me, and i am doing hundreds of snapshots and send/receiving, i've reconfigured raids on the fly and went from 6 disk raid 10 of mixed size to raid 10 of same size. we have in some cases had many power failures with no data loss. We also have some set up using md raid and some using hardware raid...

when i hear people dogging btrfs it just speaks to their inexperience imo.

that said i know it's not perfect, but what is?

[+] wasted_intel|8 years ago|reply
I've been running a two-disk RAID1 setup, alongside a single disk root drive (for snapshots) for almost two years without a single issue. I think there's still a lot of FUD being spread on account of the RAID 5/6 write hole still existing, for which this is the latest update:

> The write hole is the last missing part, preliminary patches have been posted but needed to be reworked.

[+] clircle|8 years ago|reply
Is there a stable (loaded word, I know) versioning file system available for Linux?
[+] newnewpdro|8 years ago|reply
Why is the thread title hyphenated? It made me expect there's a utility called "zfs-bug" causing data loss.
[+] xxpor|8 years ago|reply
The OP appears to speak German natively. In German they are much more hyphen happy than English. It's probably the most common error I see among German speakers typing English.
[+] zorkw4rg|8 years ago|reply
basically the reason I like my filesystems a few decades old and mature, I would not trust zfs or btrfs with anything critical.
[+] xoa|8 years ago|reply
I have been running ZFS on macOS (OS X), illumos and Solaris for a good 7 years or so now. A major part of the reason I switched over fully, despite some warts, was that I experienced actual and significant data rot from stuff I was carrying forward under XFS (IRIX), HFS, etc. I don't consider my personal stuff to go back that long, but I still have things from 1993 or so that matter to me. I did a review around 2009ish, and found that a number of old files had at some point or another become corrupted, including ones I know for sure weren't in 2004. I'd been following ZFS somewhat since Sun had demoed it, including a depressing spell after Apple almost moved to it then quit, but it was my own personal actual losses that pushed me to move over.

End to end checksumming and other integrity features just plain should be universal at this point. Should have been a decade ago or more in fact. We have so much incredibly important data now that is digital only and nowhere else, and memory and storage have both become very cheap at the general population level. It's shameful that anyone should still be losing data or experiencing anxiety years or even decades down the line. Integrity, basic levels of security/privacy, and flexible, high integrity replication should all be native level features of any data store system. "Decades old" filesystems just plain absolutely do not cut it, no does anything newer that doesn't include those promises at least as options. Bugs are unfortunate, and I hope this prompts ZoL and associated projects under the OpenZFS umbrella to double check their automated unit and stress tests. Sun rightly made a big deal of that on release. Even so, I wholeheartedly believe that ZFS or the like are far better primitives for a data storage scheme then older FS (or many newer ones for that matter).

[+] aidenn0|8 years ago|reply
Are you still using ext2 then? Ext3 was 2001, and ZFS was 2005. JFS2 is only 19 years old so doesn't quite meet the "decades" requirement.
[+] flukus|8 years ago|reply
Alternatively you can just avoid the bleeding edge versions of any file system.