top | item 14939284

The NOVA filesystem

131 points| JoshTriplett | 8 years ago |lwn.net | reply

27 comments

order
[+] ecma|8 years ago|reply
This is super interesting but

   due to the per-CPU inode table
   structure, it is impossible to
   move a NOVA filesystem from one
   system to another if the two
   machines do not have the same
   number of CPUs.
seems like a dealbreaker. What use is a FS if it isn't portable? At least it sounds like they're very aware of this issue.

I'd be very interested to see what they end up doing to make this behave better prior to upstream consideration. I wonder if a linked list journal of inode table changes (with space drawn from per-CPU freelists) would be safe/fast enough. That could be periodically coalesced and remapped to the NOVA device's non-CPU aware inode table.

[+] kalmi10|8 years ago|reply
It's not so grim.

"The origin of the CPU-count dependence is that NOVA divides PMEM into per-CPU allocation regions. We use the current CPU ID as a hint about which region to use and avoid contention on the locks that protect it. So moving from a smaller number of CPUs to a larger number of CPUs just means more contention for the locks. Moving from a larger number to a smaller number is no problem at all. So, our current plan is to set the CPU count very high (like 256) when the file system is created." - comment under the post by one of the designers of the fs

[+] microcolonel|8 years ago|reply
I don't see it as such a big problem. Most filesystem images aren't intended to be ported directly between machines in general. As long as they have a tool to read the filesystem on a different configuration, a filesystem with this restriction could still be very useful for generic bulk storage on a single node.
[+] santoshalper|8 years ago|reply
This feels like a file system designed for mobile devices, where moving the storage volume would not be a relevant concern in most cases. Based on the design, it should be more responsive, use less CPU and thus less battery.
[+] koolba|8 years ago|reply
Simple answer may be treating it the way hash partioning treats shards, i.e. overallocate to start and map the over allocation to the physical layer.
[+] loeg|8 years ago|reply
It seems like the developers could relatively easily add a slow compatibility mode that allows accessing filesystems from systems with different numbers of CPUs. It would just impose additional locking accessing all of the previous per-CPU datastructures.
[+] laythea|8 years ago|reply
Yeah this would be a deal-breaker. I can imagine the devs will be working on a way to "convert" one to the other, but I can also imagine that comes way down the priority list, so I am not disturbed by this fact, at this time.

This is required, I assume (after 5 min scan so can be wrong), that in order to speed things up, information about the CPUs are embedded in the filesystem.

So is this not a classic "hardcode for speed" vs "generic but slow" trade off?

[+] nl|8 years ago|reply
It's interesting that the assumptions in the comments here all assume one thinks of this as a disk, as opposed to a non-volatile RAM disk.

No one thinks twice about having to "do something" with their RAM disk to upgrade the CPU.

It's time we realized the new class of nonvolatile memory hardware is a new entry in the storage hierarchy. One shouldn't think of it as RAM, but it's much, much faster than traditional disk (or even SSDs) and should be thought of something different.

It's a mistake to put the same restrictions on the potential performance of this just to make it fit our preconceived notions of storage.

[+] _jal|8 years ago|reply
Agreed. It is surprisingly hard to lose some of the assumptions spinning rust has left us with. Treating this as a log-structured 'soup', I think, opens up interesting possibilities.

Interop is vital, though. I'm less concerned with being able to pull and seat a drive between machines[1] than sensible replication/backup, and a 'send/receive' type-thing shouldn't be too big of a deal.

[1] Thumb drives are an obvious exception.

[+] ecma|8 years ago|reply
I would definitely expect a separate piece of hardware to be swappable without impacting on the rest of my machine. I can change/upgrade the CPU without breaking the north bridge or losing my RAID mapping so why should the data storage mechanism on my new NVMe device (which is where my mind leaps for NOVA) be any different?

What type of usage am I missing where losing/having to do an offline rebuild of your data would be acceptable during a hardware upgrade? NOVA seems to be aiming to address the recognised issue by overprovisioning (which I disagree with but such is life). I don't really understand your argument.